Search Results: "angel"

30 September 2023

Fran ois Marier: Things I do after uploading a new package to Debian

There are a couple of things I tend to do after packaging a piece of software for Debian, filing an Intent To Package bug and uploading the package. This is both a checklist for me and (hopefully) a way to inspire other maintainers to go beyond the basic package maintainer duties as documented in the Debian Developer's Reference. If I've missed anything, please leave an comment or send me an email!

Salsa for collaborative development To foster collaboration and allow others to contribute to the packaging, I upload my package to a new subproject on Salsa. By doing this, I enable other Debian contributors to make improvements and propose changes via merge requests. I also like to upload the project logo in the settings page (i.e. https://salsa.debian.org/debian/packagename/edit) since that will show up on some dashboards like the Package overview.

Launchpad for interacting with downstream Ubuntu users While Debian is my primary focus, I also want to keep an eye on how my package is doing on derivative distributions like Ubuntu. To do this, I subscribe to bugs related to my package on Launchpad. Ubuntu bugs are rarely Ubuntu-specific and so I will often fix them in Debian. I also set myself as the answer contact on Launchpad Answers since these questions are often the sign of a Debian or a lack of documentation. I don't generally bother to fix bugs on Ubuntu directly though since I've not had much luck with packages in `universe` lately. I'd rather not spend much time preparing a package that's not going to end up being released to users as part of a Stable Release Update. On the other hand, I have succesfully requested simple Debian syncs when an important update was uploaded after the Debian Import Freeze.

Screenshots and tags I take screenshots of my package and upload them on https://screenshots.debian.net to help users understand what my package offers and how it looks. I believe that these screenshots end up in software "stores" type of applications. Similarly, I add tags to my package using https://debtags.debian.org. I'm not entirely sure where these tags are used, but they are visible from `apt show packagename`.

Monitoring Upstream Releases Staying up-to-date with upstream releases is one of the most important duties of a software packager. There are a lot of different ways that upstream software authors publicize their new releases. Here are some of the things I do to monitor these releases:

I have a cronjob which run uscan once a day to check for new upstream releases using the information specified in my debian/watch files:

  0 12 * * 1-5   francois  test -e /home/francois/devel/deb && HTTPS_PROXY= https_proxy= uscan --report /home/francois/devel/deb   true

I subscribe to the upstream project's releases RSS feed, if available. For example, I subscribe to the GitHub tags feed for git-secrets and Launchpad announcements for email-reminder.
If the upstream project maintains an announcement mailing list, I subscribe to it (e.g. rkhunter-announce or tor release announcements).

When nothing else is available, I write a cronjob that downloads the upstream changelog once a day and commits it to a local git repo:

#!/bin/bash
pushd /home/francois/devel/zlib-changelog > /dev/null
wget --quiet -O ChangeLog.txt https://zlib.net/ChangeLog.txt   exit 1
git diff
git commit -a -m "Updated changelog" > /dev/null
popd > /dev/null

This sends me a diff by email when a new release is added (and no emails otherwise).

24 September 2023

Sahil Dhiman: Abraham Raji

Man, you re no longer with us, but I am touched by the number of people you have positively impacted. Almost every DebConf23 presentations by locals I saw after you, carried how you were instrumental in bringing them there. How you were a dear friend and brother. It s a weird turn of events, that you left us during one thing we deeply cared and worked towards making possible since the last 3 years together. Who would have known, that Sahil, I m going back to my apartment tonight and casual bye post that would be the last conversation we ever had. Things were terrible after I heard the news. I had a hard time convincing myself to come see you one last time during your funeral. That was the last time I was going to get to see you, and I kept on looking at you. You, there in front of me, all calm, gave me peace. I ll carry that image all my life now. Your smile will always remain with me. Now, who ll meet and receive me on the door at almost every Debian event (just by sheer co-incidence?). Who ll help me speak out loud about all the Debian shortcomings (and then discuss solutions, when sober :)). Abraham and me during Debian discussion in DebUtsav Kochi

Abraham and me during Debian discussion in DebUtsav Kochi

It was a testament of the amount of time we had already spent together online, that when we first met during MDC Palakkad, it didn t feel we were physically meeting for the first time. The conversations just flowed. Now this song is associated with you due to your speech during post MiniDebConf Palakkad dinner. Hearing it reminds me of all the times we spent together chilling and talking community (which you cared deeply about). I guess, now we can t stop caring for the community, because your energy was contagious. Now, I can t directly dial your number to listen - Hey Sahil! What s up? from the other end, or Tell me, tell me on any mention of the problem. Nor would I be able to send reference usage of your Debian packaging guide in the wild. You already know how popular this guide of yours. How many people that guide has helped with getting started with packaging. Our last telegram text was me telling you about guide usage in Ravi s DebConf23 presentation. Did I ever tell you, I too got my first start with packaging from there. I started looking up to you from there, even before we met or talked. Now, I missed telling you, I was probably your biggest fan whenever you had the mic in hand and started speaking. You always surprised me all the insights and idea you brought and would kept on impressing me for someone who was just my age but was way more mature. Reading recent toots from Raju Dev made me realize how much I loved your writings. You wrote How the Future will remember Us , Doing what s right and many more. The level of depth in your thought was unparalleled. I loved reading those. That s why I kept pestering you to write more, which you slowly stopped. Now I fully understand why though. You were busy; really busy helping people out or just working for making things better. You were doing Debian, upstream projects, web development, designs, graphics, mentoring, free software evangelism while being the go-to person for almost everyone around. Everyone depended on you, because you were too kind to turn down anyone. Abraham and me just chilling around. We met for the first time there

Abraham and me just chilling around. We met for the first time there

Man, I still get your spelling wrong :) Did I ever tell you that? That was the reason, I used to use AR instead online. You ll be missed and will always be part of our conversations, because you have left a profound impact on me, our friends, Debian India and everyone around. See you! the coolest man around. In memory:

Farewell Abraham video by Vysakh Premkumar
Tails project dedicating 5.17.1 release
Hopscotch team dedicating 2023.8.1 release
Athul dedicating 0.2.5 release of waka-readme
Debian project

PS - Just found you even had a Youtube channel, you one heck of a talented man.

22 September 2023

Ravi Dwivedi: Debconf23

Introduction DebConf23, the 24th annual Debian Conference, was held in India in the city of Kochi, Kerala from the 3rd to the 17th of September, 2023. Ever since I got to know about it (which was more than an year ago), I was excited to attend DebConf in my home country. This was my second DebConf, as I attended one last year in Kosovo. I was very happy that I didn t need to apply for a visa to attend. I got full bursary to attend the event (thanks a lot to Debian for that!) which is always helpful in covering the expenses, especially if the venue is a five star hotel :) For the conference, I submitted two talks. One was suggested by Sahil on Debian packaging for beginners, while the other was suggested by Praveen who opined that a talk covering broader topics about freedom in self-hosting services will be better, when I started discussing about submitting a talk about prav app project. So I submitted one on Debian packaging for beginners and the other on ideas on sustainable solutions for self-hosting. My friend Suresh - who is enthusiastic about Debian and free software - wanted to attend the DebConf as well. When the registration started, I reminded him about applying. We landed in Kochi on the 28th of August 2023 during the festival of Onam. We celebrated Onam in Kochi, had a trip to Wayanad, and returned to Kochi. On the evening of the 3rd of September, we reached the venue - Four Points Hotel by Sheraton, at Infopark Kochi, Ernakulam, Kerala, India.
Suresh and me celebrating Onam in Kochi.

Hotel overview The hotel had 14 floors, and featured a swimming pool and gym (these were included in our package). The hotel gave us elevator access for only our floor, along with public spaces like the reception, gym, swimming pool, and dining areas. The temperature inside the hotel was pretty cold and I had to buy a jacket to survive. Perhaps the hotel was in cahoots with winterwear companies? :)
Four Points Hotel by Sheraton was the venue of DebConf23. Photo credits: Bilal

Photo of the pool. Photo credits: Andreas Tille.

View from the hotel window.

Meals On the first day, Suresh and I had dinner at the eatery on the third floor. At the entrance, a member of the hotel staff asked us about how many people we wanted a table for. I told her that it s just the two of us at the moment, but (as we are attending a conference) we might be joined by others. Regardless, they gave us a table for just two. Within a few minutes, we were joined by Alper from Turkey and urbec from Germany. So we shifted to a larger table but then we were joined by even more people, so we were busy adding more chairs to our table. urbec had already been in Kerala for the past 5-6 days and was, on one hand, very happy already with the quality and taste of bananas in Kerala and on the other, rather afraid of the spicy food :) Two days later, the lunch and dinner were shifted to the All Spice Restaurant on the 14th floor, but the breakfast was still served at the eatery. Since the eatery (on the 3rd floor) had greater variety of food than the other venue, this move made breakfast the best meal for me and many others. Many attendees from outside India were not accustomed to the spicy food. It is difficult for locals to help them, because what we consider mild can be spicy for others. It is not easy to satisfy everyone at the dining table, but I think the organizing team did a very good job in the food department. (That said, it didn t matter for me after a point, and you will know why.) The pappadam were really good, and I liked the rice labelled Kerala rice . I actually brought that exact rice and pappadam home during my last trip to Kochi and everyone at my home liked it too (thanks to Abhijit PA). I also wished to eat all types of payasams from Kerala and this really happened (thanks to Sruthi who designed the menu). Every meal had a different variety of payasam and it was awesome, although I didn t like some of them, mostly because they were very sweet. Meals were later shifted to the ground floor (taking away the best breakfast option which was the eatery).
This place served as lunch and dinner place and later as hacklab during debconf. Photo credits: Bilal

The excellent Swag Bag The DebConf registration desk was at the second floor. We were given a very nice swag bag. They were available in multiple colors - grey, green, blue, red - and included an umbrella, a steel mug, a multiboot USB drive by Mostly Harmless, a thermal flask, a mug by Canonical, a paper coaster, and stickers. It rained almost every day in Kochi during our stay, so handing out an umbrella to every attendee was a good idea.
Picture of the awesome swag bag given at DebConf23. Photo credits: Ravi Dwivedi

A gift for Nattie During breakfast one day, Nattie (Belgium) expressed the desire to buy a coffee filter. The next time I went to the market, I bought a coffee filter for her as a gift. She seemed happy with the gift and was flattered to receive a gift from a young man :)

Being a mentor There were many newbies who were eager to learn and contribute to Debian. So, I mentored whoever came to me and was interested in learning. I conducted a packaging workshop in the bootcamp, but could only cover how to set up the Debian Unstable environment, and had to leave out how to package (but I covered that in my talk). Carlos (Brazil) gave a keysigning session in the bootcamp. Praveen was also mentoring in the bootcamp. I helped people understand why we sign GPG keys and how to sign them. I planned to take a workshop on it but cancelled it later.

My talk My Debian packaging talk was on the 10th of September, 2023. I had not prepared slides for my Debian packaging talk in advance - I thought that I could do it during the trip, but I didn t get the time so I prepared them on the day before the talk. Since it was mostly a tutorial, the slides did not need much preparation. My thanks to Suresh, who helped me with the slides and made it possible to complete them in such a short time frame. My talk was well-received by the audience, going by their comments. I am glad that I could give an interesting presentation.
My presentation photo. Photo credits: Valessio

Visiting a saree shop After my talk, Suresh, Alper, and I went with Anisa and Kristi - who are both from Albania, and have a never-ending fascination for Indian culture :) - to buy them sarees. We took autos to Kakkanad market and found a shop with a great variety of sarees. I was slightly familiar with the area around the hotel, as I had been there for a week. Indian women usually don t try on sarees while buying - they just select the design. But Anisa wanted to put one on and take a few photos as well. The shop staff did not have a trial saree for this purpose, so they took a saree from a mannequin. It took about an hour for the lady at the shop to help Anisa put on that saree but you could tell that she was in heaven wearing that saree, and she bought it immediately :) Alper also bought a saree to take back to Turkey for his mother. Me and Suresh wanted to buy a kurta which would go well with the mundu we already had, but we could not find anything to our liking.
Selfie with Anisa and Kristi. Photo credits: Anisa.

Cheese and Wine Party On the 11th of September we had the Cheese and Wine Party, a tradition of every DebConf. I brought Kaju Samosa and Nankhatai from home. Many attendees expressed their appreciation for the samosas. During the party, I was with Abhas and had a lot of fun. Abhas brought packets of paan and served them at the Cheese and Wine Party. We discussed interesting things and ate burgers. But due to the restrictive alcohol laws in the state, it was less fun compared to the previous DebConfs - you could only drink alcohol served by the hotel in public places. If you bought your own alcohol, you could only drink in private places (such as in your room, or a friend s room), but not in public places.
Me helping with the Cheese and Wine Party.

Party at my room Last year, Joenio (Brazilian) brought pastis from France which I liked. He brought the same alocholic drink this year too. So I invited him to my room after the Cheese and Wine party to have pastis. My idea was to have them with my roommate Suresh and Joenio. But then we permitted Joenio to bring as many people as he wanted and he ended up bringing some ten people. Suddenly, the room was crowded. I was having good time at the party, serving them the snacks given to me by Abhas. The news of an alcohol party at my room spread like wildfire. Soon there were so many people that the AC became ineffective and I found myself sweating. I left the room and roamed around in the hotel for some fresh air. I came back after about 1.5 hours - for most part, I was sitting at the ground floor with TK Saurabh. And then I met Abraham near the gym (which was my last meeting with him). I came back to my room at around 2:30 AM. Nobody seemed to have realized that I was gone. They were thanking me for hosting such a good party. A lot of people left at that point and the remaining people were playing songs and dancing (everyone was dancing all along!). I had no energy left to dance and to join them. They left around 03:00 AM. But I am glad that people enjoyed partying in my room.
This picture was taken when there were few people in my room for the party.

Sadhya Thali On the 12th of September, we had a sadhya thali for lunch. It is a vegetarian thali served on a banana leaf on the eve of Thiruvonam. It wasn t Thiruvonam on this day, but we got a special and filling lunch. The rasam and payasam were especially yummy.
Sadhya Thali: A vegetarian meal served on banana leaf. Payasam and rasam were especially yummy! Photo credits: Ravi Dwivedi.

Sadhya thali being served at debconf23. Photo credits: Bilal

Day trip On the 13th of September, we had a daytrip. I chose the daytrip houseboat in Allepey. Suresh chose the same, and we registered for it as soon as it was open. This was the most sought-after daytrip by the DebConf attendees - around 80 people registered for it. Our bus was set to leave at 9 AM on the 13th of September. Me and Suresh woke up at 8:40 and hurried to get to the bus in time. It took two hours to reach the venue where we get the houseboat. The houseboat experience was good. The trip featured some good scenery. I got to experience the renowned Kerala backwaters. We were served food on the boat. We also stopped at a place and had coconut water. By evening, we came back to the place where we had boarded the boat.
Group photo of our daytrip. Photo credits: Radhika Jhalani

A good friend lost When we came back from the daytrip, we received news that Abhraham Raji was involved in a fatal accident during a kayaking trip. Abraham Raji was a very good friend of mine. In my Albania-Kosovo-Dubai trip last year, he was my roommate at our Tirana apartment. I roamed around in Dubai with him, and we had many discussions during DebConf22 Kosovo. He was the one who took the photo of me on my homepage. I also met him in MiniDebConf22 Palakkad and MiniDebConf23 Tamil Nadu, and went to his flat in Kochi this year in June. We had many projects in common. He was a Free Software activist and was the designer of the DebConf23 logo, in addition to those for other Debian events in India.
A selfie in memory of Abraham.
We were all fairly shocked by the news. I was devastated. Food lost its taste, and it became difficult to sleep. That night, Anisa and Kristi cheered me up and gave me company. Thanks a lot to them. The next day, Joenio also tried to console me. I thank him for doing a great job. I thank everyone who helped me in coping with the difficult situation. On the next day (the 14th of September), the Debian project leader Jonathan Carter addressed and announced the news officially. THe Debian project also mentioned it on their website. Abraham was supposed to give a talk, but following the incident, all talks were cancelled for the day. The conference dinner was also cancelled. As I write, 9 days have passed since his death, but even now I cannot come to terms with it.

Visiting Abraham s house On the 15th of September, the conference ran two buses from the hotel to Abraham s house in Kottayam (2 hours ride). I hopped in the first bus and my mood was not very good. Evangelos (Germany) was sitting opposite me, and he began conversing with me. The distraction helped and I was back to normal for a while. Thanks to Evangelos as he supported me a lot on that trip. He was also very impressed by my use of the StreetComplete app which I was using to edit OpenStreetMap. In two hours, we reached Abraham s house. I couldn t control myself and burst into tears. I went to see the body. I met his family (mother, father and sister), but I had nothing to say and I felt helpless. Owing to the loss of sleep and appetite over the past few days, I had no energy, and didn t think it was good idea for me to stay there. I went back by taking the bus after one hour and had lunch at the hotel. I withdrew my talk scheduled for the 16th of September.

A Japanese gift I got a nice Japanese gift from Niibe Yutaka (Japan) - a folder to keep papers which had ancient Japanese manga characters. He said he felt guilty as he swapped his talk with me and so it got rescheduled from 12th September to 16 September which I withdrew later.
Thanks to Niibe Yutaka (the person towards your right hand) from Japan (FSIJ), who gave me a wonderful Japanese gift during debconf23: A folder to keep pages with ancient Japanese manga characters printed on it. I realized I immediately needed that :)

This is the Japanese gift I received.

Group photo On the 16th of September, we had a group photo. I am glad that this year I was more clear in this picture than in DebConf22.
Click to enlarge

Volunteer work and talks attended I attended the training session for the video team and worked as a camera operator. The Bits from DPL was nice. I enjoyed Abhas presentation on home automation. He basically demonstrated how he liberated Internet-enabled home devices. I also liked Kristi s presentation on ways to engage with the GNOME community.
Bits from the DPL. Photo credits: Bilal

Kristi on GNOME community. Photo credits: Ravi Dwivedi.

Abhas' talk on home automation. Photo credits: Ravi Dwivedi.
I also attended lightning talks on the last day. Badri, Wouter, and I gave a demo on how to register on the Prav app. Prav got a fair share of advertising during the last few days.
I was roaming around with a QR code on my T-shirt for downloading Prav.

The night of the 17th of September Suresh left the hotel and Badri joined me in my room. Thanks to the efforts of Abhijit PA, Kiran, and Ananthu, I wore a mundu.
Me in mundu. Picture credits: Abhijith PA
I then joined Kalyani, Mangesh, Ruchika, Anisa, Ananthu and Kiran. We took pictures and this marked the last night of DebConf23.

Departure day The 18th of September was the day of departure. Badri slept in my room and left early morning (06:30 AM). I dropped him off at the hotel gate. The breakfast was at the eatery (3rd floor) again, and it was good. Sahil, Saswata, Nilesh, and I hung out on the ground floor.
From left: Nilesh, Saswata, me, Sahil. Photo credits: Sahil.
I had an 8 PM flight from Kochi to Delhi, for which I took a cab with Rhonda (Austria), Michael (Nigeria) and Yash (India). We were joined by other DebConf23 attendees at the Kochi airport, where we took another selfie.
Ruchika (taking the selfie) and from left to right: Yash, Joost (Netherlands), me, Rhonda
Joost and I were on the same flight, and we sat next to each other. He then took a connecting flight from Delhi to Netherlands, while I went with Yash to the New Delhi Railway Station, where we took our respective trains. I reached home on the morning of the 19th of September, 2023.
Joost and me going to Delhi. Photo credits: Ravi.

Big thanks to the organizers DebConf23 was hard to organize - strict alcohol laws, weird hotel rules, death of a close friend (almost a family member), and a scary notice by the immigration bureau. The people from the team are my close friends and I am proud of them for organizing such a good event. None of this would have been possible without the organizers who put more than a year-long voluntary effort to produce this. In the meanwhile, many of them had organized local events in the time leading up to DebConf. Kudos to them. The organizers also tried their best to get clearance for countries not approved by the ministry. I am also sad that people from China, Kosovo, and Iran could not join. In particular, I feel bad for people from Kosovo who wanted to attend but could not (as India does not consider their passport to be a valid travel document), considering how we Indians were so well-received in their country last year.

Note about myself I am writing this on the 22nd of September, 2023. It took me three days to put up this post - this was one of the tragic and hard posts for me to write. I have literally forced myself to write this. I have still not recovered from the loss of my friend. Thanks a lot to all those who helped me. PS: Credits to contrapunctus for making grammar, phrasing, and capitalization changes.

15 September 2023

John Goerzen: How Gapped is Your Air?

Sometimes we want better-than-firewall security for things. For instance:

An industrial control system for a municipal water-treatment plant should never have data come in or out
Or, a variant of the industrial control system: it should only permit telemetry and monitoring data out, and nothing else in or out
A system dedicated to keeping your GPG private keys secure should only have material to sign (or decrypt) come in, and signatures (or decrypted data) go out
A system keeping your tax records should normally only have new records go in, but may on occasion have data go out (eg, to print a copy of an old record)

In this article, I ll talk about the high side (the high-security or high-sensitivity systems) and the low side (the lower-sensitivity or general-purpose systems). For the sake of simplicity, I ll assume the high side is a single machine, but it could as well be a whole network. Let s focus on examples 3 and 4 to make things simpler. Let s consider the primary concern to be data exfiltration (someone stealing your data), with a secondary concern of data integrity (somebody modifying or destroying your data). You might think the safest possible approach is Airgapped that is, there is literal no physical network connection to the machine at all. This help! But then, the problem becomes: how do we deal with the inevitable need to legitimately get things on or off of the system? As I wrote in Dead USB Drives Are Fine: Building a Reliable Sneakernet, by using tools such as NNCP, you can certainly create a sneakernet : using USB drives as transport. While this is a very secure setup, as with most things in security, it s less than perfect. The Wikipedia airgap article discusses some ways airgapped machines can still be exploited. It mentions that security holes relating to removable media have been exploited in the past. There are also other ways to get data out; for instance, Debian ships with gensio and minimodem, both of which can transfer data acoustically. But let s back up and think about why we think of airgapped machines as so much more secure, and what the failure modes of other approaches might be.

What about firewalls? You could very easily set up high-side machine that is on a network, but is restricted to only one outbound TCP port. There could be a local firewall, and perhaps also a special port on an external firewall that implements the same restrictions. A variant on this approach would be two computers connected directly by a crossover cable, though this doesn t necessarily imply being more secure. Of course, the concern about a local firewall is that it could potentially be compromised. An external firewall might too; for instance, if your credentials to it were on a machine that got compromised. This kind of dual compromise may be unlikely, but it is possible. We can also think about the complexity in a network stack and firewall configuration, and think that there may be various opportunities to have things misconfigured or buggy in a system of that complexity. Another consideration is that data could be sent at any time, potentially making it harder to detect. On the other hand, network monitoring tools are commonplace. On the other hand, it is convenient and cheap. I use a system along those lines to do my backups. Data is sent, gpg-encrypted and then encrypted again at the NNCP layer, to the backup server. The NNCP process on the backup server runs as an untrusted user, and dumps the gpg-encrypted files to a secure location that is then processed by a cron job using Filespooler. The backup server is on a dedicated firewall port, with a dedicated subnet. The only ports allowed out are for NNCP and NTP, and offsite backups. There is no default gateway. Not even DNS is permitted out (the firewall does the appropriate redirection). There is one pinhole allowed out, where a subset of the backup data is sent offsite. I initially used USB drives as transport, and it had no network connection at all. But there were disadvantages to doing this for backups particularly that I d have no backups for as long as I d forget to move the drives. The backup system also would have clock drift, and the offsite backup picture was more challenging. (The clock drift was a problem because I use 2FA on the system; a password, plus a TOTP generated by a Yubikey) This is pretty good security, I d think. What are the weak spots? Well, if there were somehow a bug in the NNCP client, and the remote NNCP were compromised, that could lead to a compromise of the NNCP account. But this itself would accomplish little; some other vulnerability would have to be exploited on the backup server, because the NNCP account can t see plaintext data at all. I use borgbackup to send a subset of backup data offsite over ssh. borgbackup has to run as root to be able to access all the files, but the ssh it calls runs as a separate user. A ssh vulnerability is therefore unlikely to cause much damage. If, somehow, the remote offsite system were compromised and it was able to exploit a security issue in the local borgbackup, that would be a problem. But that sounds like a remote possibility. borgbackup itself can t even be used over a sneakernet since it is not asynchronous. A more secure solution would probably be using something like dar over NNCP. This would eliminate the ssh installation entirely, and allow a complete isolation between the data-access and the communication stacks, and notably not require bidirectional communication. Logic separation matters too. My Roundup of Data Backup and Archiving Tools may be helpful here. Other attack vectors could be a vulnerability in the kernel s networking stack, local root exploits that could be combined with exploiting NNCP or borgbackup to gain root, or local misconfiguration that makes the sandboxes around NNCP and borgbackup less secure. Because this system is in my basement in a utility closet with no chairs and no good place for a console, I normally manage it via a serial console. While it s a dedicated line between the system and another machine, if the other machine is compromised or an adversary gets access to the physical line, credentials (and perhaps even data) could leak, albeit slowly. But we can do much better with serial lines. Let s take a look.

Serial lines Some of us remember RS-232 serial lines and their once-ubiquitous DB-9 connectors. Traditionally, their speed maxxed out at 115.2Kbps. Serial lines have the benefit that they can be a direct application-to-application link. In my backup example above, a serial line could directly link the NNCP daemon on one system with the NNCP caller on another, with no firewall or anything else necessary. It is simply up to those programs to open the serial device appropriately. This isn t perfect, however. Unlike TCP over Ethernet, a serial line has no inherent error checking. Modern programs such as NNCP and ssh assume that a lower layer is making the link completely clean and error-free for them, and will interpret any corruption as an attempt to tamper and sever the connection. However, there is a solution to that: gensio. In my page Using gensio and ser2net, I discuss how to run NNCP and ssh over gensio. gensio is a generic framework that can add framing, error checking, and retransmit to an unreliable link such as a serial port. It can also add encryption and authentication using TLS, which could be particularly useful for applications that aren t already doing that themselves. More traditional solutions for serial communications have their own built-in error correction. For instance, UUCP and Kermit both were designed in an era of noisy serial lines and might be an excellent fit for some use cases. The ZModem protocol also might be, though it offers somewhat less flexibility and automation than Kermit. I have found that certain USB-to-serial adapters by Gearmo will actually run at up to 2Mbps on a serial line! Look for the ones on their spec pages with a FTDI chipset rated at 920Kbps. It turns out they can successfully be driven faster, especially if gensio s relpkt is used. I ve personally verified 2Mbps operation (Linux port speed 2000000) on Gearmo s USA-FTDI2X and the USA-FTDI4X. (I haven t seen any single-port options from Gearmo with the 920Kbps chipset, but they may exist). Still, even at 2Mbps, speed may well be a limiting factor with some applications. If what you need is a console and some textual or batch data, it s probably fine. If you are sending 500GB backup files, you might look for something else. In theory, this USB to RS-422 adapter should work at 10Mbps, but I haven t tried it. But if the speed works, running a dedicated application over a serial link could be a nice and fairly secure option. One of the benefits of the airgapped approach is that data never leaves unless you are physically aware of transporting a USB stick. Of course, you may not be physically aware of what is ON that stick in the event of a compromise. This could easily be solved with a serial approach by, say, only plugging in the cable when you have data to transfer.

Data diodes A traditional diode lets electrical current flow in only one direction. A data diode is the same concept, but for data: a hardware device that allows data to flow in only one direction. This could be useful, for instance, in the tax records system that should only receive data, or the industrial system that should only send it. Wikipedia claims that the simplest kind of data diode is a fiber link with transceivers connected in only one direction. I think you could go one simpler: a serial cable with only ground and TX connected at one end, wired to ground and RX at the other. (I haven t tried this.) This approach does have some challenges:

Many existing protocols assume a bidirectional link and won t be usable

There is a challenge of confirming data was successfully received. For a situation like telemetry, maybe it doesn t matter; another observation will come along in a minute. But for sending important documents, one wants to make sure they were properly received.

In some cases, the solution might be simple. For instance, with telemetry, just writing out data down the serial port in a simple format may be enough. For sending files, various mitigations, such as sending them multiple times, etc., might help. You might also look into FEC-supporting infrastructure such as blkar and flute, but these don t provide an absolute guarantee. There is no perfect solution to knowing when a file has been successfully received if the data communication is entirely one-way.

Audio transport I hinted above that minimodem and gensio both are software audio modems. That is, you could literally use speakers and microphones, or alternatively audio cables, as a means of getting data into or out of these systems. This is pretty limited; it is 1200bps, and often half-duplex, and could literally be disrupted by barking dogs in some setups. But hey, it s an option.

Airgapped with USB transport This is the scenario I began with, and named some of the possible pitfalls above as well. In addition to those, note also that USB drives aren t necessarily known for their error-free longevity. Be prepared for failure.

Concluding thoughts I wanted to lay out a few things in this post. First, that simply being airgapped is generally a step forward in security, but is not perfect. Secondly, that both physical and logical separation matter. And finally, that while tools like NNCP can make airgapped-with-USB-drive-transport a doable reality, there are also alternatives worth considering especially serial ports, firewalled hard-wired Ethernet, data diodes, and so forth. I think serial links, in particular, have been largely forgotten these days. Note: This article also appears on my website, where it may be periodically updated.

12 September 2023

John Goerzen: A Maze of Twisty Little Pixels, All Tiny

Two years ago, I wrote Managing an External Display on Linux Shouldn t Be This Hard. Happily, since I wrote that post, most of those issues have been resolved. But then you throw HiDPI into the mix and it all goes wonky. If you re running X11, basically the story is that you can change the scale factor, but it only takes effect on newly-launched applications (which means a logout/in because some of your applications you can t really re-launch). That is a problem if, like me, you sometimes connect an external display that is HiDPI, sometimes not, or your internal display is HiDPI but others aren t. Wayland is far better, supporting on-the-fly resizes quite nicely. I ve had two devices with HiDPI displays: a Surface Go 2, and a work-issued Thinkpad. The Surface Go 2 is my ultraportable Linux tablet. I use it sparingly at home, and rarely with an external display. I just put Gnome on it, in part because Gnome had better on-screen keyboard support at the time, and left it at that. On the work-issued Thinkpad, I really wanted to run KDE thanks to its tiling support (I wound up using bismuth with it). KDE was buggy with Wayland at the time, so I just stuck with X11 and ran my HiDPI displays at lower resolutions and lived with the fuzziness. But now that I have a Framework laptop with a HiDPI screen, I wanted to get this right. I tried both Gnome and KDE. Here are my observations with both: Gnome I used PaperWM with Gnome. PaperWM is a tiling manager with a unique horizontal ribbon approach. It grew on me; I think I would be equally at home, or maybe even prefer it, to my usual xmonad-style approach. Editing the active window border color required editing ~/.local/share/gnome-shell/extensions/paperwm@hedning:matrix.org/stylesheet.css and inserting background-color and border-color items in the paperwm-selection section. Gnome continues to have an absolutely terrible picture for configuring things. It has no less than four places to make changes (Settings, Tweaks, Extensions, and dconf-editor). In many cases, configuration for a given thing is split between Settings and Tweaks, and sometimes even with Extensions, and then there are sometimes options that are only visible in dconf. That is, where the Gnome people have even allowed something to be configurable. Gnome installs a power manager by default. It offers three options: performance, balanced, and saver. There is no explanation of the difference between them. None. What is it setting when I change the pref? A maximum frequency? A scaling governor? A balance between performance and efficiency cores? Not only that, but there s no way to tell it to just use performance when plugged in and balanced or saver when on battery. In an issue about adding that, a Gnome dev wrote We re not going to add a preference just because you want one . KDE, on the other hand, aside from not mucking with your system s power settings in this way, has a nice panel with on AC and on battery and you can very easily tweak various settings accordingly. The hostile attitude from the Gnome developers in that thread was a real turnoff. While Gnome has excellent support for Wayland, it doesn t (directly) support fractional scaling. That is, you can set it to 100%, 200%, and so forth, but no 150%. Well, unless you manage to discover that you can run gsettings set org.gnome.mutter experimental-features "['scale-monitor-framebuffer']" first. (Oh wait, does that make a FIFTH settings tool? Why yes it does.) Despite its name, that allows you to select fractional scaling under Wayland. For X11 apps, they will be blurry, a problem that is optional under KDE (more on that below). Gnome won t show the battery life time remaining on the task bar. Yikes. An extension might work in some cases. Not only that, but the Gnome battery icon frequently failed to indicate AC charging when AC was connected, a problem that didn t exist on KDE. Both Gnome and KDE support night light (warmer color temperatures at night), but Gnome s often didn t change when it should have, or changed on one display but not the other. The appindicator extension is pretty much required, as otherwise a number of applications (eg, Nextcloud) don t have their icon display anywhere. It does, however, generate a significant amount of log spam. There may be a fix for this. Unlike KDE, which has a nice inobtrusive popup asking what to do, Gnome silently automounts USB sticks when inserted. This is often wrong; for instance, if I m about to dd a Debian installer to it, I definitely don t want it mounted. I learned this the hard way. It is particularly annoying because in a GUI, there is no reason to mount a drive before the user tries to access it anyhow. It looks like there is a dconf setting, but then to actually mount a drive you have to open up Files (because OF COURSE Gnome doesn t have a nice removable-drives icon like KDE does) and it s a bunch of annoying clicks, and I didn t want to use the GUI file manager anyway. Same for unmounting; two clicks in KDE thanks to the task bar icon, but in Gnome you have to open up the file manager, unmount the drive, close the file manager again, etc. The ssh agent on Gnome doesn t start up for a Wayland session, though this is easily enough worked around. The reason I completely soured on Gnome is that after using it for awhile, I noticed my laptop fans spinning up. One core would be constantly busy. It was busy with a kworker events task, something to do with sound events. Logging out would resolve it. I believe it to be a Gnome shell issue. I could find no resolution to this, and am unwilling to tolerate the decreased battery life this implies. The Gnome summary: it looks nice out of the box, but you quickly realize that this is something of a paper-thin illusion when you try to actually use it regularly. KDE The KDE experience on Wayland was a little bit opposite of Gnome. While with Gnome, things start out looking great but you realize there are some serious issues (especially battery-eating), with KDE things start out looking a tad rough but you realize you can trivially fix them and wind up with a very solid system. Compared to Gnome, KDE never had a battery-draining problem. It will show me estimated battery time remaining if I want it to. It will do whatever I want it to when I insert a USB drive. It doesn t muck with my CPU power settings, and lets me easily define on AC vs on battery settings for things like suspend when idle. KDE supports fractional scaling, to any arbitrary setting (even with the gsettings thing above, Gnome still only supports it in 25% increments). Then the question is what to do with X11-only applications. KDE offers two choices. The first is Scaled by the system , which is also the only option for Gnome. With that setting, the X11 apps effectively run natively at 100% and then are scaled up within Wayland, giving them a blurry appearance on HiDPI displays. The advantage is that the scaling happens within Wayland, so the size of the app will always be correct even when the Wayland scaling factor changes. The other option is Apply scaling themselves , which uses native X11 scaling. This lets most X11 apps display crisp and sharp, but then if the system scaling changes, due to limitations of X11, you ll have to restart the X apps to get them to be the correct size. I appreciate the choice, and use Apply scaling by themselves because only a few of my apps aren t Wayland-aware. I did encounter a few bugs in KDE under Wayland: sddm, the display manager, would be slow to stop and cause a long delay on shutdown or reboot. This seems to be a known issue with sddm and Wayland, and is easily worked around by adding a systemd TimeoutStopSec. Konsole, the KDE terminal emulator, has weird display artifacts when using fractional scaling under Wayland. I applied some patches and rebuilt Konsole and then all was fine. The Bismuth tiling extension has some pretty weird behavior under Wayland, but a 1-character patch fixes it. On Debian, KDE mysteriously installed Pulseaudio instead of Debian s new default Pipewire, but that was easily fixed as well (and Pulseaudio also works fine). Conclusions I m sticking with KDE. Given that I couldn t figure out how to stop Gnome from deciding to eat enough battery to make my fan come on, the decision wasn t hard. But even if it weren t for that, I d have gone with KDE. Once a couple of things were patched, the experience is solid, fast, and flawless. Emacs (my main X11-only application) looks great with the self-scaling in KDE. Gimp, which I use occasionally, was terrible with the blurry scaling in Gnome. Update: Corrected the gsettings command

11 September 2023

John Goerzen: For the First Time In Years, I m Excited By My Computer Purchase

Some decades back, when I d buy a new PC, it would unlock new capabilities. Maybe AGP video, or a PCMCIA slot, or, heck, sound. Nowadays, mostly new hardware means things get a bit faster or less crashy, or I have some more space for files. It s good and useful, but sorta meh. Not this purchase. Cory Doctorow wrote about the Framework laptop in 2021:

There s no tape. There s no glue. Every part has a QR code that you can shoot with your phone to go to a service manual that has simple-to-follow instructions for installing, removing and replacing it. Every part is labeled in English, too! The screen is replaceable. The keyboard is replaceable. The touchpad is replaceable. Removing the battery and replacing it takes less than five minutes. The computer actually ships with a screwdriver.

Framework had been on my radar for awhile. But for various reasons, when I was ready to purchase, I didn t; either the waitlist was long, or they didn t have the specs I wanted. Lately my aging laptop with 8GB RAM started OOMing (running out of RAM). My desktop had developed a tendency to hard hang about once a month, and I researched replacing it, but the cost was too high to justify. But when I looked into the Framework, I thought: this thing could replace both. It is a real shift in perspective to have a laptop that is nearly as upgradable as a desktop, and can be specced out to exactly what I wanted: 2TB storage and 64GB RAM. And still cheaper than a Macbook or Thinkpad with far lower specs, because the Framework uses off-the-shelf components as much as possible. Cory Doctorow wrote, in The Framework is the most exciting laptop I ve ever broken:

The Framework works beautifully, but it fails even better Framework has designed a small, powerful, lightweight machine it works well. But they ve also designed a computer that, when you drop it, you can fix yourself. That attention to graceful failure saved my ass.

I like small laptops, so I ordered the Framework 13. I loaded it up with the 64GB RAM and 2TB SSD I wanted. Frameworks have four configurable ports, which are also hot-swappable. I ordered two USB-C, one USB-A, and one HDMI. I put them in my preferred spots (one USB-C on each side for easy docking and charging). I put Debian on it, and it all Just Worked. Perfectly. Now, I orderd the DIY version. I hesitated about this I HATE working with laptops because they re all so hard, even though I KNEW this one was different but went for it, because my preferred specs weren t available in a pre-assembled model. I m glad I did that, because assembly was actually FUN. I got my box. I opened it. There was the bottom shell with the motherboard and CPU installed. Here are the RAM sticks. There s the SSD. A minute or two with each has them installed. Put the bezel on the screen, attach the keyboard it has magnets to guide it into place and boom, ready to go. Less than 30 minutes to assemble a laptop nearly from scratch. It was easier than assembling most desktops. So now, for the first time, my main computing device is a laptop. Rather than having a desktop and a laptop, I just have a laptop. I ll be able to upgrade parts of it later if I want to. I can rearrange the ports. And I can take all my most important files with me. I m quite pleased!

24 August 2023

Lukas M rdian: Netplan v0.107 is now available

I m happy to announce that Netplan version 0.107 is now available on GitHub and is soon to be deployed into a Linux installation near you! Six months and more than 200 commits after the previous version (including a .1 stable release), this release is brought to you by 8 free software contributors from around the globe.

Highlights Highlights of this release include the new configuration types for veth and dummy interfaces:

network:
  version: 2
  virtual-ethernets:
    veth0:
      peer: veth1
    veth1:
      peer: veth0
  dummy-devices:
    dm0:
      addresses:
        - 192.168.0.123/24
      ...

Furthermore, we implemented CFFI based Python bindings on top of libnetplan s API, that can easily be consumed by 3rd party applications (see full cffi-bindings.py example):

from netplan import Parser, State, NetDefinition
from netplan import NetplanException, NetplanParserException

parser = Parser()
# Parse the full, existing YAML config hierarchy
parser.load_yaml_hierarchy(rootdir='/')
# Validate the final parser state
state = State()
try:
    # validation of current state + new settings
    state.import_parser_results(parser)
except NetplanParserException as e:
    print('Error in', e.filename, 'Row/Col', e.line, e.column, '->', e.message)
except NetplanException as e:
    print('Error:', e.message)
# Walk through ethernet NetdefIDs in the state and print their backend
# renderer, to demonstrate working with NetDefinitionIterator &
# NetDefinition
for netdef in state.ethernets.values():
    print('Netdef', netdef.id, 'is managed by:', netdef.backend)
    print('Is it configured to use DHCP?', netdef.dhcp4 or netdef.dhcp6)

Changelog:

Support for dummy (`dummy-devices`) interfaces (LP#1774203) by @daniloegea in #361

Support for veth (`virtual-ethernets`) interfaces by @daniloegea in #368

Add Python bindings for libnetplan by @slyon in #385

netplan: Handle command exceptions by @daniloegea in #334

WPA3 (personal) support (LP#2023238) by @daniloegea in #369

Add all the commands to the bash completion file (LP#1749869) by @daniloegea in #326

New submodule for state manipulation by @daniloegea in #379

commands/status: show routes from all routing tables by @daniloegea in #390

cli:status: Make rich pretty printing optional by @slyon in #388

libnetplan: expose dhcp4 and dhcp6 properties by @daniloegea in #394

Expose macaddress and DNS configuration from the netdef by @daniloegea in #395

libnetplan: expose the routes list in the netdef by @daniloegea in #397

NetworkManager: Wireguard private key flag support by @daniloegea in #371

Add a netplan_parser_load_keyfile() Python binding by @daniloegea in #351

keyfile parser: add support for all tunnel types (LP#2016473) by @daniloegea in #360

parse-nm:wg: add support for reading the listen-port property by @daniloegea in #372

parse-nm: add support for VRF devices by @daniloegea in #398

Vlan keyfile parser support by @daniloegea in #370

Netplan docs rework by @daniloegea in #333 & #337

docs: Add a short netplan-everywhere howto by @daniloegea in #325

doc: make us of sphinx copybutton plugin by @slyon in #354

doc: Add Ubuntu Code of Conduct 2.0 by @slyon in #355

doc: Explanation about 00-network-manager-all.yaml by @slyon in #378

Bug fixes:

Fix FTBFS on Fedora and refresh RPM packaging by @Conan-Kudo in #323

parser: validate lacp-rate properly (LP#1745648) by @daniloegea in #324

meson: use meson-make-symlink.sh helper script instead of install_symlink() by @slyon in #327

netplan: cli: fix typo from unkown to unknown by @AristoChen in #328

Handle duplication during parser second pass (LP#2007682) by @daniloegea in #329

parse:ovs: Ignore deprecated OpenFlow1.6 protocol (LP#1963735) by @slyon in #332

dbus: Build the copy path correctly by @daniloegea in #331

tests: add new spread based snapd integration test by @mvo5 in #330

Use controlled execution environment, to avoid failure if PATH is unset (LP#1959570) by @slyon in #336

Some refactoring by @slyon in #338

netplan: adjust the maximum buffer size to 1MB by @daniloegea in #340

parse: use with systemd-escape by @daniloegea in #347

docs: fix bridge parameters types and add examples by @daniloegea in #346

vrfs: skip policies parsing if list is NULL (LP#2016427) by @daniloegea in #341

networkd: plug a memory leak by @daniloegea in #344

libnetplan: don t try to read from a NULL file by @daniloegea in #342

nm: return if write_routes() fails by @daniloegea in #345

parse: plug a memory leak by @daniloegea in #348

parse: set the backend on nm-devices to NM by default by @daniloegea in #349

parse: don t point to the wrong node on validation by @daniloegea in #343

rtd: set the OS and Python versions explicitly by @daniloegea in #357

Fix 8021x eap method parsing (LP#2016625) by @daniloegea in #358

CI: update canonical/setup-lxd to v0.1.1 by @barrettj12 in #359

CI: fix dch after adding the new 0.106.1 tag by @daniloegea in #364

Provide frequency to wpa_supplicant when in adhoc mode (LP#2020754) by @yevmel in #363

Improve the coverage of the memory leak tests by @daniloegea in #365

Fix keyfile parsing of wireguard config when the prefix of allowed IPs is omited by @daniloegea in #366

routes: fix metric rendering (LP#2023681) by @bengentil in #367

CI: add DebCI integration test by @slyon in #362

CI: initial NetworkManager autopkgtests by @slyon in #374

parse-nm: handle cloned-mac-address special cases (LP#2026230) by @daniloegea in #376

Improve autopkgtest stability with systemd 253 & iproute 6.4 by @slyon in #377

Fixes for minor issues by @daniloegea in #380

tests:integration: Adopt for systemd v254 (Closes: #1041310) by @slyon in #381

parse: Downgrade NM passthrough warning to debug by @slyon in #384

netplan.c: Don t drop files with just global values on set (LP#2027584) by @slyon in #382

Fixing Coverity issues by @daniloegea in #383

CLI: Refactoring to avoid namespace clash with public bindings by @slyon in #387

tests: fix test coverage report with newer python-coverage by @daniloegea in #389

github: add a scheduled action to run Coverity by @daniloegea in #391

github: only run the coverity workflow on our repository by @daniloegea in #392

Addressing a few issues found by coverity by @daniloegea in #393

Wireguard fixes by @daniloegea in #352

Fix a memory leak, an assert and an error message by @daniloegea in #350

ovs: don t allow peers with the same name by @daniloegea in #353

CI: make use of the canonical/setup-lxd action by @slyon in #356

17 August 2023

Jonathan Carter: Debian 30th Birthday: Local Group event and Interview

Inspired by the fine Debian Local Groups all over the world, I ve long since wanted to start one in Cape Town. Unfortunately, there s been many obstacles over the years. Shiny distractions, an epidemic, DPL terms these are just some of the things that got in the way. Fortunately, things are starting to gain traction, and we re well on our way to forming a bona fide local group for South Africa. We got together at Woodstock Grill, they have both a nice meeting room, and good food and beverage, also reasonably central for most of us.

Cake Starting with the important stuff, we got this Debian cake made that ended up much bigger than we expected, at least we all got to take some home too! (it tasted great too)

Yes, cake.

Talk This event was planned very last minute, so we didn t do any kind of RSVP and I had no idea who exactly would show up, so I went ahead and prepared one of my usual introduction to Linux and Debian talks, and how these things are having an impact on the world out there. I also talked a bit about the community and how we intend to grow our local team here in South Africa. It turned out most of the audience were already experienced Linux users, but I was happy to see that they were very enthusiastic about the local group concept!

While reading through some material to find some inspiration for this talk, I came across an old quote from the original Debian Manifesto that I found very poignant again, so I feel compelled to share (didn t use it in my talk this time though since I didn t cover much current events): The time has come to concentrate on the future of Linux rather than on the destructive goal of enriching oneself at the expense of the entire Linux community and its future. Ian Murdock, 1994

Debian-ZA logo Tammy spent some time creating a whole bunch of logo concepts, that she presented to us. They aren t meant as final logo choices, but as initial concepts, and they worked well to invoke a very lively discussion about logos and design!

Here are just some of her designs that I cherry picked since they were the most discussed. We still haven t decided if it will be Debian ZA or Debian-ZA or Debian South Africa, although the latter will probably cause least confusion internationally.

Personally, the last one on this image that I referred to as the carpet is my personal favourite :-)

Happy Birthday Song John and Heila wrote a happy birthday song. After seeing the lyrics (and considering myself an amateur lyricist) I thought it was way too tacky and I told John to put it away. But when the cake came out, someone said we should sing! and the lyrics quickly re-emerged and was handed out to everyone. It s also meant to be a loopy clip for the upcoming DebConf23. I ll concede that it worked out alright in the end! You judge for yourself:

Mousepads People still use mousepads? That was my initial reaction when Heila told me that she was going to make some commemorative 30 year Debian mousepads for our birthday event, and they ended up being popular. It s probably a safer surface to put dev boards on than on my desk directly, so at least I do have a use for one!

Group Photo The group photo, complete with disco ball! We should ve taken this earlier because it was already getting late and some people had to head back home. Lesson learned for next time!

30th Birthday Interview with The Changelog I also did an interview with The Changelog for Debian s 30th birthday. It was late, and I haven t had a chance to listen to it yet, so I hope their producers managed to edit something coherent out of my usual Debian babbling:

More later There s so much more I d like to say about Debian, the last 30 years, local groups, the eco-system that we find ourselves in. And, Lemmy! But I ll see if I can get an instance up over the weekend and will then talk about that some more another time.

4 August 2023

John Goerzen: Try the Last Internet Kermit Server

$ grep kermit /etc/services
kermit          1649/tcp

What is this mysterious protocol? Who uses it and what is its story? This story is a winding one, beginning in 1981. Kermit is, to the best of my knowledge, the oldest actively-maintained software package with an original developer still participating. It is also a scripting language, an Internet server, a (scriptable!) SSH client, and a file transfer protocol. And my first use of it was talking to my HP-48GX calculator over a 9600bps serial link. Yes, that calculator had a Kermit server built in. But let s back up and talk about serial ports and Modems.

Serial Ports and Modems In my piece The PC & Internet Revolution in Rural America, I recently talked about getting a modem what an excitement it was to get one! I realize that many people today have never used a serial line or a modem, so let s briefly discuss. Before Ethernet and Wifi took off in a big way, in the 1990s-2000s, two computers would talk to each other over a serial line and a modem. By modern standards, these were slow; 300bps was a common early speed. They also (at least in the beginning) had no kind of error checking. Characters could be dropped or changed. Sometimes even those speeds were faster than the receiving device could handle. Some serial links were 7-bit, and wouldn t even pass all 7-bit characters; for instance, sending a Ctrl-S could lock up a remote until you sent Ctrl-Q. And computers back in the 1970s and 1980s weren t as uniform as they are now. They used different character sets, different line endings, and even had different notions of what a file is. Today s notion of a file as whatever set of binary bytes an application wants it to be was by no means universal; some systems treated a file as a set of fixed-length records, for instance. So there were a lot of challenges in reliably moving files between systems. Kermit was introduced to reliably move files between systems using serial lines, automatically working around the varieties of serial lines, detecting errors and retransmitting, managing transmit speeds, and adapting between architectures as appropriate. Quite a task! And perhaps this explains why it was supported on a calculator with a primitive CPU by today s standards. Serial communication, by the way, is still commonplace, though now it isn t prominent in everyone s home PC setup. It s used a lot in industrial equipment, avionics, embedded systems, and so forth. The key point about serial lines is that they aren t inherently multiplexed or packetized. Whereas an Ethernet network is designed to let many dozens of applications use it at once, a serial line typically runs only one (unless it is something like PPP, which is designed to do multiplexing over the serial line). So it become useful to be able to both log in to a machine and transfer files with it. That is, incidentally, still useful today.

Kermit and XModem/ZModem I wondered: why did we end up with two diverging sets of protocols, created at about the same time? The Kermit website has the answer: essentially, BBSs could assume 8-bit clean connections, so XModem and ZModem had much less complexity to worry about. Kermit, on the other hand, was highly flexible. Although ZModem came out a few years before Kermit had its performance optimizations, by about 1993 Kermit was on par or faster than ZModem.

Beyond serial ports As LANs and the Internet came to be popular, people started to use telnet (and later ssh) to connect to remote systems, rather than serial lines and modems. FTP was an early way to transfer files across the Internet, but it had its challenges. Kermit added telnet support, as well as later support for ssh (as a wrapper around the `ssh` command you already know). Now you could easily log in to a machine and exchange files with it without missing a beat. And so it was that the Internet Kermit Service Daemon (IKSD) came into existence. It allows a person to set up a Kermit server, which can authenticate against local accounts or present anonymous access akin to FTP. And so I established the quux.org Kermit Server, which runs the Unix IKSD (part of the Debian ckermit package).

Trying Out the quux.org Kermit Server There are more instructions on the quux.org Kermit Server page! You can connect to it using either telnet or the kermit program. I won t duplicate all of the information here, but here s what it looks like to connect:

$ kermit
C-Kermit 10.0 Beta.08, 15 Dec 2022, for Linux+SSL (64-bit)
 Copyright (C) 1985, 2022,
  Trustees of Columbia University in the City of New York.
  Open Source 3-clause BSD license since 2011.
Type ? or HELP for help.
(/tmp/t/) C-Kermit>iksd /user:anonymous kermit.quux.org
 DNS Lookup...  Trying 135.148.101.37...  Reverse DNS Lookup... (OK)
Connecting to host glockenspiel.complete.org:1649
 Escape character: Ctrl-\ (ASCII 28, FS): enabled
Type the escape character followed by C to get back,
or followed by ? to see other options.
----------------------------------------------------

 >>> Welcome to the Internet Kermit Service at kermit.quux.org <<<

To log in, use 'anonymous' as the username, and any non-empty password

Internet Kermit Service ready at Fri Aug  4 22:32:17 2023
C-Kermit 10.0 Beta.08, 15 Dec 2022
kermit

Enter e-mail address as Password: [redacted]

Anonymous login.

You are now connected to the quux kermit server.

Try commands like HELP, cd gopher, dir, and the like.  Use INTRO
for a nice introduction.

(~/) IKSD>

You can even recursively download the entire Kermit mirror: over 1GB of files!

Conclusions So, have fun. Enjoy this experience from the 1980s. And note that Kermit also makes a better ssh client than ssh in a lot of ways; see ideas on my Kermit page. This page also has a permanent home on my website, where it may be periodically updated.

21 July 2023

Gunnar Wolf: Road trip through mountain ridges to find the surreal

We took a couple of days of for a family vacation / road trip through the hills of Central Mexico. The overall trip does not look like anything out of the ordinary

Other than the fact that Google forecasted we d take approximately 15.5 hours driving for 852Km that is, an average of almost 55 Km/h. And yes, that s what we signed up for. And that s what we got. Of course, the exact routes are not exactly what Google suggested (I can say we optimized a bit the route, i.e., by avoiding the metropolitan area of Quer taro, at the extreme west, and going via San Juan del R o / Tequisquiapan / Bernal). The first stretch of the road is just a regular, huge highway, with no particular insights. The highways leaving and entering Mexico City on the North are not fun nor beautiful, only they are needed to get nice trips going Mexico City sits at a point of changing climates. Of course, it is a huge city And I cannot imagine how it would be without all of the urbanization it now sports. But anyway: On the West, South, and part of the East, it is surrounded by high mountains, with beautiful and dense forests. Mexico City is 2200m high, and most of the valley s surrounding peaks are ~3000m (and at the South Eastern tip, our two big volcanoes, Popocat petl and Iztacc huatl, get past the 5700m mark). Towards the North, the landscape is flatter and much more dry. Industrial compounds give way to dry grasslands. Of course, central Mexico does not understand the true meaning of flat, and the landscape is full with eh-not-very-big mountains. Then, as we entered Quer taro State, we started approaching Bernal. And we saw a huge rock that looks like it is not supposed to be there! It just does not fit the surroundings.

Shortly after Bernal, we entered a beautiful, although most crumpled, mountain ridge: Sierra Gorda de Quer taro. Sierra Gorda encompasses most of the North of the (quite small 11500Km total) state of Quer taro, plus portions of the neighboring states; other than the very abrupt and sharp orography, what strikes me most is the habitat diversity it encompasses. We started going up an absolute desert, harsh and beautiful; we didn t take pictures along the way as the road is difficult enough that there are almost no points for stopping for refreshments or for photo opportunities. But it is quite majestic. And if you think deserts are barren, boring places well, please do spend some time enjoying them! Anyway At on point, the road passes by a ~3100m height, and suddenly Pines! More pines! A beautiful forest! We reached our first stop at the originally mining town of Pinal de Amoles.

After spending the night there and getting a much needed rest, we started a quite steep descent towards Jalpan de Serra. While it is only ~20Km away on the map, we descended from 2300 to 760 meters of altitude (and the road was over 40Km long).

Being much lower, the climate drastically changed from cool and humid to quite warm and the body attitude in the kids does not lie!

In the mid-18th century, Fray Jun pero Serra established five missions to evangelize the population of this very harsh territory, and the frontispiece for the church and monastery in Jalpan is quite breathtaking.

But we were just passing by Jalpan. A short visit to the church and to the ice-cream shop, and we were again on our way. We crossed the state border, entering San Luis Potos , and arrived to our main destination: Xilitla, the little town in the beautiful Huasteca where the jungle meets surrealism. Xilitla was chosen by the British poet and patron of various surrealist artists https://en.wikipedia.org/wiki/Edward_James. He was a British noble (an unofficial grandson of King Edward VII), and heir to a huge fortune. I m not going to repeat here his very well known biography suffice to say that he got in love with the Huasteca, and bought a >30ha piece of jungle and mountain close to the Xilitla town, and made it his house. With very ample economic resources, in the late 1940s he started his lifelong project of building a surrealist garden. And Well, that s enough blabbering for me. I m sharing some pictures I took there. The place is plainly magic and wonderful. Edward James died in 1984, and his will decrees that after his death, the jungle should be allowed to reclaim the constructions so many structures are somewhat crumbling, and it is expected they will break down in the following decades. But for whoever comes to Mexico This magic place is definitely worth the heavy ride to the middle of the mountains and to the middle of the jungle.

Xilitla now also hosts a very good museum with sculptures by Leonora Carrington, James long-time friend, but I m not going to abuse this space with even more pictures. And of course, we did more, and enjoyed more, during our three days in Xilitla. And for our way back I wanted to try a different route. We decided to come back to Mexico City crossing Hidalgo state instead of Quer taro. I had feared the roads would be in a worse shape or would be more difficult to travel And I was happy to be proven wrong!

This was the longest driving stretch approximately 6:30 for 250Km. The roads are in quite decent shape, and while there are some stretches where we were quite lonely (probably the loneliest one was the sharp ascent from Tamazunchale to the detour before Orizatl n), the road felt safe and well kept at all times. The sights all across Eastern Hidalgo are breathtaking, and all furiously green (be it with really huge fern leaves or with tall, strong pines), until Zacualtip n.

And just as abruptly or more as when we entered Pinal de Amoles We crossed Orizatl n, and we were in a breathtaking arid, desert-like environment again. We crossed the Barranca de Metztitl n natural reserve, and arrived to spend the night at Huasca de Ocampo. There are many more things we could have done starting at Huasca, a region where old haciendas thrived, full of natural formations, and very very interesting. But we were tired and pining to be finally back home. So we rested until mid-morning and left straight back home in Mexico City. Three hours later, we were relaxing, preparing lunch, the kids watching whatever-TV-like-things are called nowadays. All in all, a very beautiful vacation!

12 July 2023

John Goerzen: Backing Up and Archiving to Removable Media: dar vs. git-annex

This is the fourth in a series about archiving to removable media (optical discs such as BD-Rs and DVD+Rs or portable hard drives). Here are the first three parts:

In part 1, I laid out my goals for the project, and considered a number of tools before determining dar and git-annex were my leading options.
In part 2, I took a deep dive into git-annex and simulated using it for this project.
In part 3, I did the same with dar.
And in this part, I want to put it together to come up with an initial direction to pursue.

I want to state at the outset that this is not a general review of dar or git-annex. This is an analysis of how those tools stack up to a particular use case. Neither tool focuses on this use case, and I note it is particularly far from the more common uses of git-annex. For instance, both tools offer support for cloud storage providers and special support for ssh targets, but neither of those are in-scope for this post. Comparison Matrix As part of this project, I made a comparison matrix which includes not just dar and git-annex, but also backuppc, bacula/bareos, and borg. This may give you some good context, and also some reference for other projects in this general space. Reviewing the Goals I identified some goals in part 1. They are all valid. As I have thought through the project more, I feel like I should condense them into a simpler ordered list, with the first being the most important. I omit some things here that both dar and git-annex can do (updates/incrementals, for instance; see the expanded goals list in part 1). Here they are:

The tool must not modify the source data in any way.
It must be simple to create or update an archive. Processes that require a lot of manual work, are flaky, or are difficult to do correctly, are unlikely to be done correctly and often. If it s easy to do right, I m more likely to do it. Put another way: an archive never created can never be restored.
The chances of a successful restore by someone that is not me, that doesn t know Linux, and is at least 10 years in the future, should be maximized. This implies a simple toolset, solid support for dealing with media errors or missing media, etc.
Both a partial point-in-time restore and a full restore should be possible. The full restore must, at minimum, provide a consistent directory tree; that is, deletions, additions, and moves over time must be accurately reflected. Preserving modification times is a near-requirement, and preserving hard links, symbolic links, and other POSIX metadata is a significant nice-to-have.
There must be a strategy to provide redundancy; for instance, a way for one set of archive discs to be offsite, another onsite, and the two to be periodically swapped.
Use storage space efficiently.

Let s take a look at how the two stack up against these goals. Goal 1: Not modifying source data With dar, this is accomplished. dar --create does not modify source data (and even has a mode to avoid updating atime) so that s done. git-annex normally does modify source data, in that it typically replaces files with symlinks into its hash-indexed storage directory. It can instead use hardlinks. In either case, you will wind up with files that have identical content (but may have originally been separate, non-linked files) linked together with git-annex. This would cause me trouble, as well as run the risk of modifying timestamps. So instead of just storing my data under a git-annex repo as is its most common case, I use the directory special remote with importtree=yes to sort of import the data in. This, plus my desire to have the repos sensible and usable on non-POSIX operating systems, accounts for a chunk of the git-annex complexity you see here. You wouldn t normally see as much complexity with git-annex (though, as you will see, even without the directory special remote, dar still has less complexity). Winner: dar, though I demonstrated a working approach with git-annex as well. Goal 2: Simplicity of creating or updating an archive Let us simply start by recognizing this:

Number of commands to create a first dar archive, including all splits: 1
Number of commands to create a first git-annex archive, with just the first two splits: 58
Number of commands to create a dar incremental: 1
Number of commands to update the last git-annex drive: 10
Number of commands to do a full restore of all slices and both archives with dar: 2 (1 if dar_manager used)
Number of commands to do a full restore of just the first two drive with git-annex: 9 (but my process may not be correct)

Both tools have a lot of power, but I must say, it is easier to wrap my head around what dar is doing than what git-annex is doing. Everything dar does is with files: here are the files to archive, here is an archive file, here is a detached (isolated) catalog. It is very straightforward. It took me far less time to develop my dar page than my git-annex page, despite having existing familiarity with both tools. As I pointed out in part 2, I still don t fully understand how git-annex syncs metadata. Unsolved mysteries from that post include why the two git-annex drives had no idea what was on the other drives, and why the export operation silenty did nothing. Additionally, for the optical disc case, I had to create a restricted-size filesystem/dataset for git-annex to write into in order to get the desired size limit. Looking at the optical disc case, dar has a lot of nice infrastructure built in. With pause and execute, it can very easily be combined with disc burning operations. slice will automatically limit the size of a given slice, regardless of how much disk space is free, meaning that the git-annex tricks of creating smaller filesystems/datasets are unnecessary with dar. To create an initial full backup with dar, you just give it the size of the device, and it will automatically split up the archive, with hooks to integrate for burning or changing drives. About as easy as you could get. With git-annex, you would run the commands to have it fill up the initial filesystem, then burn the disc (or remove the drive), then run the commands to create another repo on the second filesystem, and so forth. With hard drives, with git-annex you would do something similar; let it fill up a repo on a drive, and if it exits with a space error, swap in the next. With dar, you would slice as with an optical disk. Dar s slicing is less convenient in this case, though, as it assumes every drive is the same size and yours may not be. You could work around that by using a slice size no bigger than the smallest drive, and putting multiple slices on larger drives if need be. If a single drive is large enough to hold your entire data set, though, you need not worry about this with either tool. Here s a warning about git-annex: it won t store anything beneath directories named .git. My use case doesn t have many of those. If your use case does, you re going to have to figure out what to do about it. Maybe rename them to something else while the backup runs? In any case, it is simply a fact that git-annex cannot back up git repositories, and this cuts against being able to back up things correctly. Another point is that git-annex has scalability concerns. If your archive set gets into the hundreds of thousands of files, you may need to split it into multiple distinct git-annex repositories. If this occurs and it will in my case it may serve to dull the shine of some of git-annex s features such as location tracking. A detour down the update strategies path Update strategies get a little more complicated with both. First, let s consider: what exactly should our update strategy be? For optical discs, I might consider doing a monthly update. I could burn a disc (or more than one, if needed) regardless of how much data is going to go onto it, because I want no more than a month s data lost in any case. An alternative might be to spool up data until I have a disc s worth, and then write that, but that could possibly mean months between actually burning a disc. Probably not good. For removable drives, we re unlikely to use a new drive each month. So there it makes sense to continue writing to the drive until it s full. Now we have a choice: do we write and preserve each month s updates, or do we eliminate intermediate changes and just keep the most recent data? With both tools, the monthly burn of an optical disc turns out to be very similar to the initial full backup to optical disc. The considerations for spanning multiple discs are the same. With both tools, we would presumably want to keep some metadata on the host so that we don t have to refer to a previous disc to know what was burned. In the dar case, that would be an isolated catalog. For git-annex, it would be a metadata-only repo. I illustrated both of these in parts 2 and 3. Now, for hard drives. Assuming we want to continue preserving each month s updates, with dar, we could just write an incremental to the drive each month. Assuming that the size of the incremental is likely far smaller than the size of the drive, you could easily enough do this. More fancily, you could look at the free space on the drive and tell dar to use that as the size of the first slice. For git-annex, you simply avoid calling drop/dropunused. This will cause the old versions of files to accumulate in .git/annex. You can get at them with git annex commands. This may imply some degree of elevated risk, as you are modifying metadata in the repo each month, which with dar you could chmod a-w or even chattr +i the archive files once written. Hopefully this elevated risk is low. If you don t want to preserve each month s updates, with dar, you could just write an incremental each month that is based on the previous drive s last backup, overwriting the previous. That implies some risk of drive failure during the time the overwrite is happening. Alternatively, you could write an incremental and then use dar to merge it into the previous incremental, creating a new one. This implies some degree of extra space needed (maybe on a different filesystem) while doing this. With git-annex, you would use drop/dropunused as I demonstrated in part 2. The winner for goal 2 is dar. The gap is biggest with optical discs and more narrow with hard drives, thanks to git-annex s different options for updates. Still, I would be more confident I got it right with dar. Goal 3: Greatest chance of successful restore in the distant future If you use git-annex like I suggested in part 2, you will have a set of discs or drives that contain a folder structure with plain files in them. These files can be opened without any additional tools at all. For sheer ability to get at raw data, git-annex has the edge. When you talk about getting a consistent full restore without multiple copies of renamed files or deleted files coming back then you are going to need to use git-annex to do that. Both git-annex and dar provide binaries. Dar provides a win64 version on its Sourceforge page. On the author s releases site, you can find the win64 version in addition to a statically-linked x86_64 version for Linux. The git-annex install page mostly directs you to package managers for your distribution, but the downloads page also lists builds for Linux, Windows, and Mac OS X. The Linux version is dynamic, but ships most of its .so files alongside. The Windows version requires cygwin.dll, and all versions require you to also install git itself. Both tools are in package managers for Mac OS X, Debian, FreeBSD, and so forth. Let s just say that you are likely to be able to run either one on a future Windows or Linux system. There are also GUI frontends for dar, such as DARGUI and gdar. This can increase the chances of a future person being able to use the software easily. git-annex has the assistant, which is based on a different use case and probably not directly helpful here. When it comes to doing the actual restore process using software, dar provides the easier process here. For dealing with media errors and the like, dar can integrate with par2. While technically you could use par2 against the files git-annex writes, that s more cumbersome to manage to the point that it is likely not to be done. Both tools can deal reasonably with missing media entirely. I m going to give the edge on this one to git-annex; while dar does provide the easier restore and superior tools for recovering from media errors, the ability to access raw data as plain files without any tools at all is quite compelling. I believe it is the most critical advantage git-annex has, and it s a big one. Goal 4: Support high-fidelity partial and full restores Both tools make it possible to do a full restore reflecting deletions, additions, and so forth. Dar, as noted, is easier for this, but it is possible with git-annex. So, both can achieve a consistent restore. Part of this goal deals with fidelity of the restore: preserving timestamps, hard and symbolic links, ownership, permissions, etc. Of these, timestamps are the most important for me. git-annex can t do any of that. dar does all of it. Some of this can be worked around using mtree as I documented in part 2. However, that implies a need to also provide mtree on the discs for future users, and I m not sure mtree really exists for Windows. It also cuts against the argument that git-annex discs can be used without any tools. It is true, they can, but all you will get is filename and content; no accurate date. Timestamps are often highly relevant for everything from photos to finding an elusive document or record. Winner: dar. Goal 5: Supporting backup strategies with redundancy My main goal here is to have two separate backup sets: one that is offsite, and one that is onsite. Depending on the strategy and media, they might just always stay that way, or periodically rotate. For instance, with optical discs, you might just burn two copies of every disc and store one at each place. For hard drives, since you will be updating the content of them, you might swap them periodically. This is possible with both tools. With both tools, if using the optical disc scheme I laid out, you can just burn two identical copies of each disc. With the hard drive case, with dar, you can keep two directories of isolated catalogs, one for each drive set. A little identifier file on each drive will let you know which set to use. git-annex can track locations itself. As I demonstrated in part 2, you can make each drive its own repo, add all drives from a given drive set to a git-annex group. When initializing a drive, you tell git-annex what group it s a prt of. From then on, git-annex knows what content is in each group and will add whatever a given drive s group needs to that drive. It s possible to do this with both, but the winner here is git-annex. Goal 6: Efficient use of storage Here are situations in which one or the other will be more efficient:

Lots of small files: dar, due to reduced filesystem overhead
Compressible data: dar (git-annex doesn t support compression)
Renamed files: git-annex (it will detect the sha256 match and avoid storing a duplicate copy)
Identical files: git-annex, unless they are hardlinked already (again, detects the sha256 match)
Small modifications to files (eg, ID3 tags on MP3s, EXIF data on photos, etc): dar (it supports rsync-style binary deltas)

The winner depends on your particular situation. Other notes While not part of the goals above, dar is capable of using tapes directly. While not as common, they are often used in communities of people that archive lots of data. Conclusions Overall, dar is the winner for me. It is simpler in most areas, easier to get correct, and scales very well. git-annex does, however, have some quite compelling points. Being able to access files as plain files is huge, and its location tracking is nicer than dar s, even when using dar_manager. Both tools are excellent and I recommend them both and for more than the particular scenario shown here. Both have fantastic and responsive authors.

26 June 2023

Lukas M rdian: Netplan 0.106.1 stable release

We are happy to announce that Netplan 0.106.1 is available for download on Ubuntu Mantic Minotaur and Debian testing. This release includes some improvements in our documentation and CI infrastructure and a number of bug fixes.

What s new in Netplan 0.106.1?

Documentation

New Netplan tutorial. A tutorial for beginners on how to get started with Netplan was included in our main documentation. It s intended to help newcomers to get familiar with Netplan and guide the reader through the basics to start using it. The new tutorial can be found in this link https://netplan.readthedocs.io/en/0.106.1/tutorial/

Netplan How-tos. The how-tos section was restructured and expanded with more use cases. The how-tos can be found in this link https://netplan.readthedocs.io/en/0.106.1/howto/

Manual pages. All Netplan subcommands now have their own man page. The online version of the man pages can be found in this link https://netplan.readthedocs.io/en/0.106.1/cli/

Desktop integration how-to. In Ubuntu Mantic (23.10), Netplan and Network Manager work in tandem to generate network configuration. This integration is described in more details here https://netplan.readthedocs.io/en/0.106.1/netplan-everywhere/

Infrastructure

canonical/setup-lxd GitHub action. The autopkgtest environment creation was standardized to use Canonical s setup-lxd action.

Snapd integrations tests with spread. A new test set for the Snapd integration with Netplan was introduced using the spread tool.

DBus. A number of DBus integration tests were added to the Debian package.

New features

Keyfile parser improvements. Our Network Manager keyfile parser (the capability of loading Network Manager configuration to Netplan YAML) was expanded to support all the types of tunnels supported by Netplan.

Misc

Ubuntu s Code of Conduct 2.0 was added to the code repository.

We added a new bash autocompletion script with all the Netplan s subcommands.

The new release package was synchronized with Debian.

Bug fixes

Keyfile parser. This release contains a couple of important fixes for the NetworkManager integration stability: 1) adding WPA enterprise connections is now working fine and new test cases were added to the package; 2) a WireGuard peer with allowed IPs that don t include the network prefix are now accepted.

Netplan parser. A number of memory leaks and stability issues were fixed.

DBus. An issue related to how directory paths are built in the Netplan DBus service was causing issues in the Snapd integration and was fixed.

For the complete list of changes please consult the debian/changelog file in https://launchpad.net/ubuntu/+source/netplan.io/+changelog

17 June 2023

John Goerzen: Using dar for Data Archiving

This is the third post in a series about data archiving to removable media (optical discs and hard drives). In the first, I explained the difference between backing up and archiving, established goals for the project, and said I d evaluate git-annex and dar. The second post evaluated git-annex, and now it s time to look at dar. The series will conclude with a post comparing git-annex with dar. What is dar? I could open with the same thing I did with git-annex, just changing the name of the program: [dar] is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. It is, fundamentally, an archiver like tar or zip (makes one file representing a bunch of other files), but it goes far beyond that. dar s homepage lays out a comprehensive list of features, which I will try to summarize here.

Dar itself is both a library (with C++ and Python bindings) for interacting with data, and a CLI tool (dar itself).
Alongside this, there is an ecosystem of tools around dar, including GUIs for multiple platforms, backup scripts, and FUSE implementations.
Dar is like tar in that it can read and write files sequentially if desired. Dar archives can be streamed, just like tar archives. But dar takes it further; if you have dar_slave on the remote end, random access is possible over ssh (dramatically speeding up certain operations).
Dar is like zip in that a dar archive contains a central directory (called a catalog) which permits random access to the contents of an archive. In other words, you don t have to read an entire archive to extract just one file (assuming the archive is on disk or something that itself permits random access). Also, dar can compress each file individually, rather than the tar approach of compressing the archive as a whole. This increases archive performance (dar knows not to try to compress already-compressed data), boosts restore resilience (corruption of one part of an archive doesn t invalidate the entire rest of it), and boosts restore performance (permitting random access).
Dar can split an archive into multiple pieces called slices, and it can even split member files among the slices. The catalog contains information allowing you to know which slice(s) a given file is saved in.
The catalog can also be saved off in a file of its own (dar calls this an isolated catalog ). Isolated catalogs record just metadata about files archived.
dar_manager can assemble a database by reading archives or isolated catalogs, letting you know where files are stored and facilitating restores using the minimal number of discs.
Dar supports differential/incremental backups, which record changes since the last backup. These backups record not just additions, but also deletions. dar can optionally use rsync-style binary deltas to minimize the space needed to record changes. Dar does not suffer from GNU tar s data loss bug with incrementals.
Dar can slice and dice archives like Perl does strings. The usage notes page shows how you can merge archives, create decremental archives (where the full backup always reflects the current state of the system, and incrementals go backwards in time instead of forwards), etc. You can change the compression algorithm on an existing archive, re-slice it, etc.
Dar is extremely careful about preserving all metadata: hard links, sparse files, symlinks, timestamps (including subsecond resolution), EAs, POSIX ACLs, resource forks on Mac, detecting files being modified while being read, etc. It makes a nice way to copy directories, sort of similar to rsync -avxHAXS.

So to tie this together for this project, I will set up a 400MB slice size (to mimic what I did with git-annex), and see how dar saves the data and restores it. Isolated cataloges aren t strictly necessary for this, but by using them (and/or dar_manager), we can build up a database of files and locations and thus directly compare dar to git-annex location tracking. Walkthrough: Creating the first archive As with the git-annex walkthrough, I ll set some variables to make it easy to remember:

$SOURCEDIR is the directory being backed up
$DRIVE is the directory for backups to be stored in. Since dar can split by a specified size, I don t need to make separate filesystems to simulate the separate drive experience as I did with git-annex.
$CATDIR will hold isolated catalogs
$DARDB points to the dar_manager database

OK, we can run the backup immediately. No special setup is needed. dar supports both short-form (single-character) parameters and long-form ones. Since the parameters probably aren t familiar to everyone, I will use the long-form ones in these examples. Here s how we create our initial full backup. I ll explain the parameters below:



$ dar \

  --verbose \

  --create $DRIVE/bak1 \

  --on-fly-isolate $CATDIR/bak1 \

  --slice 400M \

  --min-digits 2 \

  --pause \

  --fs-root $SOURCEDIR

Let s look at each of these parameters:

verbose does what you expect
create selects the operation mode (like tar -c) and gives the archive basename
on-fly-isolate says to write an isolated catalog as well, right while making the archive. You can always create an isolated catalog later (which is fast, since it only needs to read the last bits of the last slice) but it s more convenient to do it now, so we do. We give the base name for the isolated catalog also.
slice 400M says to split the archive, and create slices 400MB each.
min-digits 2 pertains to naming files. Without it, dar would create files named bak1.dar.1, bak1.dar.2, bak1.dar.10, etc. dar works fine with this, but it can be annoying in ls. This is just convenience for humans.
pause tells dar to pause after writing each slice. This would let us swap drives, burn discs, etc. I do this for demonstration purposes only; it isn t strictly necessary in this situation. For a more powerful option, dar also supports execute, which can run commands after each slice.
fs-root gives the path to actually back up.

This same command could have been written with short options as:



$ dar -v -c $DRIVE/bak1 -@ $CATDIR/bak1 -s 400M -9 2 -p -R $SOURCEDIR

What does it look like while running? Here s an excerpt:



...

Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/[redacted]

Finished writing to file 1, ready to continue ?  [return = YES   Esc = NO]

...

Writing down archive contents...

Closing the escape layer...

Writing down the first archive terminator...

Writing down archive trailer...

Writing down the second archive terminator...

Closing archive low layer...

Archive is closed.

 --------------------------------------------

 581 inode(s) saved

   including 0 hard link(s) treated

 0 inode(s) changed at the moment of the backup and could not be saved properly

 0 byte(s) have been wasted in the archive to resave changing files

 0 inode(s) with only metadata changed

 0 inode(s) not saved (no inode/file change)

 0 inode(s) failed to be saved (filesystem error)

 0 inode(s) ignored (excluded by filters)

 0 inode(s) recorded as deleted from reference backup

 --------------------------------------------

 Total number of inode(s) considered: 581

 --------------------------------------------

 EA saved for 0 inode(s)

 FSA saved for 581 inode(s)

 --------------------------------------------

Making room in memory (releasing memory used by archive of reference)...

Now performing on-fly isolation...

...

That was easy! Let s look at the contents of the backup directory:



$ ls -lh $DRIVE

total 3.7G

-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.01.dar

-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.02.dar

-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.03.dar

-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.04.dar

-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.05.dar

-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.06.dar

-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.07.dar

-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.08.dar

-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:29 bak1.09.dar

-rw-r--r-- 1 jgoerzen jgoerzen 156M Jun 16 19:33 bak1.10.dar

And the isolated catalog:



$ ls -lh $CATDIR

total 37K

-rw-r--r-- 1 jgoerzen jgoerzen 35K Jun 16 19:33 bak1.1.dar

The isolated catalog is stored compressed automatically. Well this was easy. With one command, we archived the entire data set, split into 400MB chunks, and wrote out the catalog data. Walkthrough: Inspecting the saved archive Can dar tell us which slice contains a given file? Sure:



$ dar --list $DRIVE/bak1 --list-format=slicing   less

Slice(s) [Data ][D][ EA  ][FSA][Compr][S] Permission  Filemane

--------+--------------------------------+----------+-----------------------------

...

1        [Saved][ ]       [-L-][   0%][X] -rwxr--r-- [redacted]

1-2      [Saved][ ]       [-L-][   0%][X] -rwxr--r-- [redacted]

2        [Saved][ ]       [-L-][   0%][X] -rwxr--r-- [redacted]

...

This illustrates the transition from slice 1 to slice 2. The first file was stored entirely in slice 1; the second stored partially in slice 1 and partially in slice 2, and third solely in slice 2. We can get other kinds of information as well.



$ dar --list $DRIVE/bak1   less

[Data ][D][ EA  ][FSA][Compr][S]  Permission   User    Group   Size               Date                      filename

--------------------------------+------------+-------+-------+---------+-------------------------------+------------

[Saved][ ]       [-L-][   0%][X]  -rwxr--r--   jgoerzen jgoerzen        24 Mio  Mon Mar  5 07:58:09 2018        [redacted]

[Saved][ ]       [-L-][   0%][X]  -rwxr--r--   jgoerzen jgoerzen        16 Mio  Mon Mar  5 07:58:09 2018        [redacted]

[Saved][ ]       [-L-][   0%][X]  -rwxr--r--   jgoerzen jgoerzen        22 Mio  Mon Mar  5 07:58:09 2018        [redacted]

These are the same files I was looking at before. Here we see they are 24MB, 16MB, and 22MB in size, and some additional metadata. Even more is available in the XML list format. Walkthrough: updates As with git-annex, I ve made some changes in the source directory: moved a file, added another, and deleted one. Let s create an incremental backup now:



$ dar \

  --verbose \

  --create $DRIVE/bak2 \

  --on-fly-isolate $CATDIR/bak2 \

  --ref $CATDIR/bak1 \

  --slice 400M \

  --min-digits 2 \

  --pause \

  --fs-root $SOURCEDIR

This command is very similar to the earlier one. Instead of writing an archive and catalog named bak1, we write one named bak2. What s new here is --ref $CATDIR/bak1. That says, make an incremental based on an archive of reference. All that is needed from that archive of reference is the detached catalog. --ref $DRIVE/bak1 would have worked equally well here. Here s what I did to the $SOURCEDIR:

Renamed a file to file01-unchanged
Deleted a file
Copied /bin/cp to a file named cp

Let s see if dar s command output matches this:



...

Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/file01-unchanged

Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/file01-unchanged

Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/cp

Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/cp

Adding folder to archive: [redacted]

Saving Filesystem Specific Attributes for [redacted]

Adding reference to files that have been destroyed since reference backup...

...

 --------------------------------------------

 3 inode(s) saved

   including 0 hard link(s) treated

 0 inode(s) changed at the moment of the backup and could not be saved properly

 0 byte(s) have been wasted in the archive to resave changing files

 0 inode(s) with only metadata changed

 578 inode(s) not saved (no inode/file change)

 0 inode(s) failed to be saved (filesystem error)

 0 inode(s) ignored (excluded by filters)

 2 inode(s) recorded as deleted from reference backup

 --------------------------------------------

 Total number of inode(s) considered: 583

 --------------------------------------------

 EA saved for 0 inode(s)

 FSA saved for 3 inode(s)

 --------------------------------------------

...

Yes, it does. The rename is recorded as a deletion and an addition, since dar doesn t directly track renames. So the rename plus the deletion account for the two deletions. The rename plus the addition of cp count as 2 of the 3 inodes saved; the third is the modified directory from which files were deleted and moved out. Let s see the files that were created:



$ ls -lh $DRIVE/bak2*

-rw-r--r-- 1 jgoerzen jgoerzen 18M Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/drive/bak2.01.dar

$ ls -lh $CATDIR/bak2*

-rw-r--r-- 1 jgoerzen jgoerzen 22K Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/cat/bak2.1.dar

What does list look like now?



Slice(s) [Data ][D][ EA  ][FSA][Compr][S] Permission  Filemane

--------+--------------------------------+----------+-----------------------------

         [     ][ ]       [---][-----][X] -rwxr--r-- [redacted]

1        [Saved][ ]       [-L-][   0%][X] -rwxr--r-- file01-unchanged

...

         [--- REMOVED ENTRY ----][redacted]

         [--- REMOVED ENTRY ----][redacted]

Here I show an example of:

A file that was not changed from the initial backup. Its presence was simply noted, but because we re doing an incremental, the data wasn t saved.
A file that is saved in this incremental, on slice 1.
The two deleted files

Walkthrough: dar_manager As we ve seen above, the two archives (or their detached catalog) give us a complete picture of what files were present at the time of the creation of each archive, and what files were stored in a given archive. We can certainly continue working in that way. We can also use dar_manager to build a comprehensive database of these archives, to be able to find what media is necessary to restore each given file. Or, with dar_manager s when parameter, we can restore files as of a particular date. Let s try it out. First, we create our database:



$ dar_manager --create $DARDB

$ dar_manager --base $DARDB --add $DRIVE/bak1

Auto detecting min-digits to be 2

$ dar_manager --base $DARDB --add $DRIVE/bak2

Auto detecting min-digits to be 2

Here we created the database, and added our two catalogs to it. (Again, we could have as easily used $CATDIR/bak1; either the archive or its isolated catalog will work here.) It s important to add the catalogs in order. Let s do some quick experimentation with dar_manager:



$ dar_manager -v --base $DARDB --list

Decompressing and loading database to memory...


dar path         :

dar options      :

database version : 6

compression used : gzip

compression level: 9
archive #        path           basename

------------+--------------+---------------

        1       /acrypt/no-backup/jgoerzen/dar-testing/drive    bak1

        2       /acrypt/no-backup/jgoerzen/dar-testing/drive    bak2

$ dar_manager --base $DARDB --stat

  archive #      most recent/total data    most recent/total EA

--------------+-------------------------+-----------------------

             1 580/581                        0/0

             2 3/3                            0/0

The list option shows the correlation between dar_manager archive number (1, 2) with filenames (bak1, bak2). It is coincidence here that 1/bak1 and 2/bak2 correlate; that s not necessarily the case. Most dar_manager commands operate on archive number, while dar commands operate on archive path/basename. Now let s see just what files are saved in archive #2, the incremental:



$ dar_manager --base $DARDB --used 2

[ Saved ][       ]  [redacted]

[ Saved ][       ]  file01-unchanged

[ Saved ][       ]  cp

Now we can also where a file is stored. Here s one that was saved in the full backup and unmodified in the incremental:



$ dar_manager --base $DARDB --file [redacted]

        1       Fri Jun 16 19:15:12 2023  saved                                 absent

        2       Fri Jun 16 19:15:12 2023  present                               absent

(The absent at the end refers to extended attributes that the file didn t have) Similarly, for files that were added or removed, they ll be listed only at the appropriate place. Walkthrough: Restoration I m not going to repeat the author s full restoration with dar page, but here are some quick examples. A simple way of doing everything is using incrementals for the whole series. To do that, you d have bak1 be full, bak2 based on bak1, bak3 based on bak2, bak4 based on bak3, etc. To restore from such a series, you have two options:

Use dar to simply extract each archive in order. It will handle deletions, renames, etc. along the way.
Use dar_manager with the backup database to do manage the process. It may be somewhat more efficient, as it won t bother to restore files that will later be modified or deleted.

If you get fancy for instance, bak2 is based on bak1, bak3 on bak2, bak4 on bak1 then you would want to use dar_manager to ensure a consistent restore is completed. Either way, the process is nearly identical. Also, I figure, to make things easy, you can save a copy of the entire set of isolated catalogs before you finalize each disc/drive. They re so small, and this would let someone with just the most recent disc build a dar_manager database without having to go through all the other discs. Anyhow, let s do a restore using just dar. I ll make a $RESTOREDIR and do it that way.



$ dar \

  --verbose \

  --extract $DRIVE/bak1 \

  --fs-root $RESTOREDIR \

  --no-warn \

  --execute "echo Ready for slice %n.  Press Enter; read foo"

This execute lets us see how dar works; this is an illustration of the power it has (above pause); it s a snippet interpreted by /bin/sh with %n being one of the dar placeholders. If memory serves, it s not strictly necessary, as dar will prompt you for slices it needs if they re not mounted. Anyhow, you ll see it first reading the last slice, which contains the catalog, then reading from the beginning. Here we go:



Auto detecting min-digits to be 2

Opening archive bak1 ...

Opening the archive using the multi-slice abstraction layer...

Ready for slice 10. Press Enter

...

Loading catalogue into memory...

Locating archive contents...

Reading archive contents...

File ownership will not be restored du to the lack of privilege, you can disable this message by asking not to restore file ownership [return = YES   Esc = NO]

Continuing...

Restoring file's data: [redacted]

Restoring file's FSA: [redacted]

Ready for slice 1. Press Enter

...

Ready for slice 2. Press Enter

...

 --------------------------------------------

 581 inode(s) restored

    including 0 hard link(s)

 0 inode(s) not restored (not saved in archive)

 0 inode(s) not restored (overwriting policy decision)

 0 inode(s) ignored (excluded by filters)

 0 inode(s) failed to restore (filesystem error)

 0 inode(s) deleted

 --------------------------------------------

 Total number of inode(s) considered: 581

 --------------------------------------------

 EA restored for 0 inode(s)

 FSA restored for 0 inode(s)

 --------------------------------------------

The warning is because I m not doing the extraction as root, which limits dar s ability to fully restore ownership data. OK, now the incremental:



$ dar \

  --verbose \

  --extract $DRIVE/bak2 \

  --fs-root $RESTOREDIR \

  --no-warn \

  --execute "echo Ready for slice %n.  Press Enter; read foo"

...

Ready for slice 1. Press Enter

...

Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged

Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged

Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp

Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp

Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/[redacted directory]

Removing file (reason is file recorded as removed in archive): [redacted file]

Removing file (reason is file recorded as removed in archive): [redacted file]

This all looks right! Now how about we compare the restore to the original source directory?



$ diff -durN $SOURCEDIR $RESTOREDIR

No changes perfect. We could instead do this restore via a single dar_manager command, though annoyingly, we d have to pass all top-level files/directories to dar_manager restore. But still, it s one command, and basically automates and optimizes the dar restores shown above. Conclusions Dar makes it extremely easy to just Do The Right Thing when making archives. One command makes a backup. It saves things in simple files. You can make an isolated catalog if you want, and it too is saved in a simple file. You can query what is in the files and where. You can restore from all or part of the files. You can simply play the backups forward, in order, to achieve a full and consistent restore. Or you can load data about them into dar_manager for an optimized restore. A bit of scripting will be necessary to make incrementals; finding the most recent backup or catalog. If backup files are named with care for instance, by date then this should be a pretty easy task. I haven t touched on resiliency yet. dar comes with tools for recovering archives that have had portions corrupted or lost. It can also rebuild the catalog if it is corrupted or lost. It adds tape marks (or escape sequences ) to the archive along with the data stream. So every entry in the catalog is actually stored in the archive twice: once alongside the file data, and once at the end in the collected catalog. This allows dar to scan a corrupted file for the tape marks and reconstruct whatever is still intact, even if the catalog is lost. dar also integrates with tools like sha256sum and par2 to simplify archive integrity testing and restoration. This balances against the need to use a tool (dar, optionally with a GUI frontend) to restore files. I ll discuss that more in the next post.

16 June 2023

John Goerzen: Using git-annex for Data Archiving

In my recent post about data archiving to removable media, I laid out the difference between backing up and archiving, and also said I d evaluate git-annex and dar. This post evaluates git-annex. The next will look at dar, and then I ll make a comparison post. What is git-annex? git-annex is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. Its homepage says:

git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.

I think the particularly interesting features of git-annex aren t actually included in that list. Among the features of git-annex that make it shine for this purpose, its location tracking is key. git-annex can know exactly which device has which file at which version at all times. Combined with its preferred content settings, this lets you very easily say things like:

I want exactly 1 copy of every file to exist within the set #1 of backup drives. Here s a drive in that set; copy to it whatever needs to be copied to satisfy that requirement.
Now I have another set of backup drives. Periodically I will swap sets offsite. Copy whatever is needed to this drive in the second set, making sure that there is 1 copy of every file within this set as well, regardless of what s in the first set.
Here s a directory I want to use to track the status of everything else. I don t want any copies at all here.

git-annex can be set to allow a configurable amount of free space to remain on a device, and it will fill it up with whatever copies are necessary up until it hits that limit. Very convenient! git-annex will store files in a folder structure that mirrors the origin folder structure, in plain files just as they were. This maximizes the ability for a future person to access the content, since it is all viewable without any special tool at all. Of course, for things like optical media, git-annex will essentially be creating what amounts to incrementals. To obtain a consistent copy of the original tree, you would still need to use git-annex to process (export) the archives. git-annex challenges In my prior post, I related some challenges with git-annex. The biggest of them quite poor performance of the directory special remote when dealing with many files has been resolved by Joey, git-annex s author! That dramatically improves the git-annex use scenario here! The fixing commit is in the source tree but not yet in a release. git-annex no doubt may still have performance challenges with repositories in the 100,000+-range, but in that order of magnitude it now looks usable. I m not sure about 1,000,000-file repositories (I haven t tested); there is a page about scalability. A few other more minor challenges remain:

git-annex doesn t really preserve POSIX attributes; for instance, permissions, symlink destinations, and timestamps are all not preserved. Of these, timestamps are the most important for my particular use case.
If your data set to archive contains Git repositories itself, these will not be included.

I worked around the timestamp issue by using the mtree-netbsd package in Debian. mtree writes out a summary of files and metadata in a tree, and can restore them. To save: mtree -c -R nlink,uid,gid,mode -p /PATH/TO/REPO -X <(echo './.git') > /tmp/spec And, after restoration, the timestamps can be applied with: mtree -t -U -e < /tmp/spec Walkthrough: initial setup To use git-annex in this way, we have to do some setup. My general approach is this:

There is a source of data that lives outside git-annex. I'll call this $SOURCEDIR.
I'm going to name the directories holding my data $REPONAME.
There will be a "coordination" git-annex repo. It will hold metadata only, and no data. This will let us track where things live. I'll call it $METAREPO.
There will be drives. For this example, I'll call their mountpoints $DRIVE01 and $DRIVE02. For easy demonstration purposes, I used a ZFS dataset with a refquota set (to observe the size handling), but I could have as easily used a LVM volume, btrfs dataset, loopback filesystem, or USB drive. For optical discs, this would be a staging area or a UDF filesystem.

Let's get started! I've set all these shell variables appropriately for this example, and REPONAME to "testdata". We'll begin by setting up the metadata-only tracking repo.



$ REPONAME=testdata

$ mkdir "$METAREPO"

$ cd "$METAREPO"

$ git init

$ git config annex.thin true

There is a sort of complicated topic of how git-annex stores files in a repo, which varies depending on whether the data for the file is present in a given repo, and whether the file is locked or unlocked. Basically, the options I use here cause git-annex to mostly use hard links instead of symlinks or pointer files, for maximum compatibility with non-POSIX filesystems such as NTFS and UDF, which might be used on these devices. thin is part of that. Let's continue:



$ git annex init 'local hub'

init local hub ok

(recording state in git...)

$ git annex wanted . "include=* and exclude=$REPONAME/*"

wanted . ok

(recording state in git...)

In a bit, we are going to import the source data under the directory named $REPONAME (here, testdata). The wanted command says: in this repository (represented by the bare dot), the files we want are matched by the rule that says eveyrthing except what's under $REPONAME. In other words, we don't want to make an unnecessary copy here. Because I expect to use an mtree file as documented above, and it is not under $REPONAME/, it will be included. Let's just add it and tweak some things.



$ touch mtree

$ git annex add mtree

add mtree

ok

(recording state in git...)

$ git annex sync

git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)

commit

[main (root-commit) 6044742] git-annex in local hub

 1 file changed, 1 insertion(+)

 create mode 120000 mtree

ok

$ ls -l

total 9

lrwxrwxrwx 1 jgoerzen jgoerzen 178 Jun 15 22:31 mtree -> .git/annex/objects/pX/ZJ/...

OK! We've added a file, and it got transformed into a symlink. That's the thing I said we were going to avoid, so:



 git annex adjust --unlock-present

adjust

Switched to branch 'adjusted/main(unlockpresent)'

ok

$ ls -l

total 1

-rw-r--r-- 2 jgoerzen jgoerzen 0 Jun 15 22:31 mtree

You'll notice it transformed into a hard link (nlinks=2) file. Great! Now let's import the source data. For that, we'll use the directory special remote.



$ git annex initremote source type=directory  directory=$SOURCEDIR importtree=yes \

   encryption=none

initremote source ok

(recording state in git...)

$ git annex enableremote source directory=$SOURCEDIR

enableremote source ok

(recording state in git...)

$ git config remote.source.annex-readonly true

$ git config annex.securehashesonly true

$ git config annex.genmetadata true

$ git config annex.diskreserve 100M

$ git config remote.source.annex-tracking-branch main:$REPONAME

OK, so here we created a new remote named "source". We enabled it, and set some configuration. Most notably, that last line causes files from "source" to be imported under $REPONAME/ as we wanted earlier. Now we're ready to scan the source.



$ git annex sync

At this point, you'll see git-annex computing a hash for every file in the source directory. I can verify with du that my metadata-only repo only uses 14MB of disk space, while my source is around 4GB. Now we can see what git-annex thinks about file locations:



$ git-annex whereis   less

whereis mtree (1 copy)

        8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]

ok

whereis testdata/[redacted] (0 copies)

  The following untrusted locations may also have copies:

        9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]

failed

... many more lines ...

So remember we said we wanted mtree, but nothing under testdata, under this repo? That's exactly what we got. git-annex knows that the files under testdata can be found under the "source" special remote, but aren't in any git-annex repo -- yet. Now we'll start adding them. Walkthrough: removable drives I've set up two 500MB filesystems to represent removable drives. We'll see how git-annex works with them.



$ cd $DRIVE01

$ df -h .

Filesystem                     Size  Used Avail Use% Mounted on

acrypt/no-backup/annexdrive01  500M  1.0M  499M   1% /acrypt/no-backup/annexdrive01

$ git clone $METAREPO

Cloning into 'testdata'...

done.

$ cd $REPONAME

$ git config annex.thin true

$ git annex init "test drive #1"

$ git annex adjust --hide-missing --unlock

adjust

Switched to branch 'adjusted/main(hidemissing-unlocked)'

ok

$ git annex sync

OK, that's the initial setup. Now let's enable the source remote and configure it the same way we did before:



$ git annex enableremote source directory=$SOURCEDIR

enableremote source ok

(recording state in git...)

$ git config remote.source.annex-readonly true

$ git config remote.source.annex-tracking-branch main:$REPONAME

$ git config annex.securehashesonly true

$ git config annex.genmetadata true

$ git config annex.diskreserve 100M

Now, we'll add the drive to a group called "driveset01" and configure what we want on it:



$ git annex group . driveset01

$ git annex wanted . '(not copies=driveset01:1)'

What this does is say: first of all, this drive is in a group named driveset01. Then, this drive wants any files for which there isn't already at least one copy in driveset01. Now let's load up some files!



$ git annex sync --content

As the messages fly by from here, you'll see it mentioning that it got mtree, and then various files from "source" -- until, that is, the filesystem had less than 100MB free, at which point it complained of no space for the rest. Exactly like we wanted! Now, we need to teach $METAREPO about $DRIVE01.



$ cd $METAREPO

$ git remote add drive01 $DRIVE01/$REPONAME

$ git annex sync drive01

git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)

commit

On branch adjusted/main(unlockpresent)

nothing to commit, working tree clean

ok

merge synced/main (Merging into main...)

Updating d1d9e53..817befc

Fast-forward

(Merging into adjusted branch...)

Updating 7ccc20b..861aa60

Fast-forward

ok

pull drive01

remote: Enumerating objects: 214, done.

remote: Counting objects: 100% (214/214), done.

remote: Compressing objects: 100% (95/95), done.

remote: Total 110 (delta 6), reused 0 (delta 0), pack-reused 0

Receiving objects: 100% (110/110), 13.01 KiB   1.44 MiB/s, done.

Resolving deltas: 100% (6/6), completed with 6 local objects.

From /acrypt/no-backup/annexdrive01/testdata

 * [new branch]      adjusted/main(hidemissing-unlocked) -> drive01/adjusted/main(hidemissing-unlocked)

 * [new branch]      adjusted/main(unlockpresent)        -> drive01/adjusted/main(unlockpresent)

 * [new branch]      git-annex                           -> drive01/git-annex

 * [new branch]      main                                -> drive01/main

 * [new branch]      synced/main                         -> drive01/synced/main

ok

OK! This step is important, because drive01 and drive02 (which we'll set up shortly) won't necessarily be able to reach each other directly, due to not being plugged in simultaneously. Our $METAREPO, however, will know all about where every file is, so that the "wanted" settings can be correctly resolved. Let's see what things look like now:



$ git annex whereis   less

whereis mtree (2 copies)

        8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]

        b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]

ok

whereis testdata/[redacted] (1 copy)

        b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]

  The following untrusted locations may also have copies:

        9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]

ok

If I scroll down a bit, I'll see the files past the 400MB mark that didn't make it onto drive01. Let's add another example drive! Walkthrough: Adding a second drive The steps for $DRIVE02 are the same as we did before, just with drive02 instead of drive01, so I'll omit listing it all a second time. Now look at this excerpt from whereis:



whereis testdata/[redacted] (1 copy)

        b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]


  The following untrusted locations may also have copies:

        9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]

ok

whereis testdata/[redacted] (1 copy)

        c4540343-e3b5-4148-af46-3f612adda506 -- test drive #2 [drive02]

  The following untrusted locations may also have copies:

        9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]

ok

Look at that! Some files on drive01, some on drive02, some neither place. Perfect! Walkthrough: Updates So I've made some changes in the source directory: moved a file, added another, and deleted one. All of these were copied to drive01 above. How do we handle this? First, we update the metadata repo:



$ cd $METAREPO

$ git annex sync

$ git annex dropunused all

OK, this has scanned $SOURCEDIR and noted changes. Let's see what whereis says:



$ git annex whereis   less

...

whereis testdata/cp (0 copies)

  The following untrusted locations may also have copies:

        9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]

failed

whereis testdata/file01-unchanged (1 copy)

        b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]

  The following untrusted locations may also have copies:

        9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]

ok

So this looks right. The file I added was a copy of /bin/cp. I moved another file to one named file01-unchanged. Notice that it realized this was a rename and that the data still exists on drive01. Well, let's update drive01.



$ cd $DRIVE01/$REPONAME

$ git annex sync --content

Looking at the testdata/ directory now, I see that file01-unchanged has been renamed, the deleted file is gone, but cp isn't yet here -- probably due to space issues; as it's new, it's undefined whether it or some other file would fill up free space. Let's work along a few more commands.



$ git annex get --auto

$ git annex drop --auto

$ git annex dropunused all

And now, let's make sure metarepo is updated with its state.



$ cd $METAREPO

$ git annex sync

We could do the same for drive02. This is how we would proceed with every update. Walkthrough: Restoration Now, we have bare files at reasonable locations in drive01 and drive02. But, to generate a consistent restore, we need to be able to actually do an export. Otherwise, we may have files with old names, duplicate files, etc. Let's assume that we lost our source and metadata repos and have to restore from scratch. We'll make a new $RESTOREDIR. We'll begin with drive01 since we used it most recently.



$ mv $METAREPO $METAREPO.disabled

$ mv $SOURCEDIR $SOURCEDIR.disabled

$ git clone $DRIVE01/$REPONAME $RESTOREDIR

$ cd $RESTOREDIR

$ git config annex.thin true

$ git annex init "restore"

$ git annex adjust --hide-missing --unlock

Now, we need to connect the drive01 and pull the files from it.



$ git remote add drive01 $DRIVE01/$REPONAME

$ git annex sync --content

Now, repeat with drive02:



$ git remote add drive02 $DRIVE02/$REPONAME

$ git annex sync --content

Now we've got all our content back! Here's what whereis looks like:



whereis testdata/file01-unchanged (3 copies)

        3d663d0f-1a69-4943-8eb1-f4fe22dc4349 -- restore [here]

        9e48387e-b096-400a-8555-a3caf5b70a64 -- source

        b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [origin]

ok

...

I was a little surprised that drive01 didn't seem to know what was on drive02. Perhaps that could have been remedied by adding more remotes there? I'm not entirely sure; I'd thought would have been able to do that automatically. Conclusions I think I have demonstrated two things: First, git-annex is indeed an extremely powerful tool. I have only scratched the surface here. The location tracking is a neat feature, and being able to just access the data as plain files if all else fails is nice for future users. Secondly, it is also a complex tool and difficult to get right for this purpose (I think much easier for some other purposes). For someone that doesn't live and breathe git-annex, it can be hard to get right. In fact, I'm not entirely sure I got it right here. Why didn't drive02 know what files were on drive01 and vice-versa? I don't know, and that reflects some kind of misunderstanding on my part about how metadata is synced; perhaps more care needs to be taken in restore, or done in a different order, than I proposed. I initially tried to do a restore by using git annex export to a directory special remote with exporttree=yes, but I couldn't ever get it to actually do anything, and I don't know why. These two cut against each other. On the one hand, the raw accessibility of the data to someone with no computer skills is unmatched. On the other hand, I'm not certain I have the skill to always prepare the discs properly, or to do a proper consistent restore.

1 June 2023

Paul Wise: FLOSS Activities May 2023

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

gensim: cancel obsolete GitHub jobs

i2c-tools: fix links

notmuch-mutt: fix data loss, encoding

foxtrotgps: gpsd API bump

flower: drop bootstrap/celery branding

pass-otp: fix typos, links, errors, warnings, safety

git-imerge: fix tests, typo, lists of versions, distros

duck: indicators

Debian website: recommend pseudo-header for X-Debbugs-CC

Debian screenshots:

deleted pypy (just website)

Debian QA services: fix crash

Debian BTS: allow multiple X-Debbugs-CC lines, improve docs

Debian BTS usertags: fix some porting, release, DSA usertags

Debian package uploads: sptag

Debian wiki pages: accessibility-devel, Brasil/IRC, Firmware/Updates, InstallingDebianOn/Wandboard, InstructionSelection, PortsDocs/New, Ports/hurd-amd64, Ports/loong64, SIMDEverywhere, Teams/RustPackaging

FOSSjobs wiki pages: Resources

Issues

SIMDe: incorrect macros/definitions/names, GCC warnings

Review

Spam: reported 3 Debian bug reports and 22 Debian mailing list posts

Debian packages: sponsored trurl

Debian wiki: RecentChanges for the month

Debian BTS usertags: changes for the month

Debian screenshots:

approved 3d-ascii-viewer angelfish autokey-gtk convert-pgn crystal-facet-uml deborphan diodon gcc glewlwyd glob2 kdiskmark loook man-db nemo ntpdate ntpsec-ntpdate paperwork-gtk pdftk-java picom python2.7 python3-tk python-tk scite software-properties-gtk stockfish stress-ng sudoku-solver util-linux-extra wmweather x11-apps xchat youtubedl-gui

rejected pypy (website), fonts-atarismall (UI not just font), gtk-theme-config (photo), kamailio (diagram not interface), node-connect (bug report), laditools/debian-installer/weboob-qt (artwork), xserver-xorg-input-synaptics/ubuntustudio-menu (full desktop), freecell-solver-bin (mobile app), vorbis-tools (manual page), veroroute (macOS?), ttf-mscorefonts-installer (better existing screenshot), gir1.2-gdkpixbuf-2.0 (random mobile website), chromium-browser (random website), mixer.app (aspirational text?), stockfish/pulseaudio-utils (selfie)

Administration

Debian IRC: set topic on new #debian-sa channel

Debian wiki: unblock IP addresses, approve accounts

Communication

Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The SIMDe, gensim, sptag work was sponsored. All other work was done on a volunteer basis.

31 May 2023

Russ Allbery: Review: Night Watch

Review: Night Watch, by Terry Pratchett

Series:	Discworld #29
Publisher:	Harper
Copyright:	November 2002
Printing:	August 2014
ISBN:	0-06-230740-1
Format:	Mass market
Pages:	451

Night Watch is the 29th Discworld novel and the sixth Watch novel. I would really like to tell people they could start here if they wanted to, for reasons that I will get into in a moment, but I think I would be doing you a disservice. The emotional heft added by having read the previous Watch novels and followed Vimes's character evolution is significant. It's the 25th of May. Vimes is about to become a father. He and several of the other members of the Watch are wearing sprigs of lilac for reasons that Sergeant Colon is quite vehemently uninterested in explaining. A serial killer named Carcer the Watch has been after for weeks has just murdered an off-duty sergeant. It's a tense and awkward sort of day and Vimes is feeling weird and wistful, remembering the days when he was a copper and not a manager who has to dress up in ceremonial armor and meet with committees. That may be part of why, when the message comes over the clacks that the Watch have Carcer cornered on the roof of the New Hall of the Unseen University, Vimes responds in person. He's grappling with Carcer on the roof of the University Library in the middle of a magical storm when lightning strikes. When he wakes up, he's in the past, shortly after he joined the Watch and shortly before the events of the 25th of May that the older Watch members so vividly remember and don't talk about. I have been saying recently in Discworld reviews that it felt like Pratchett was on the verge of a breakout book that's head and shoulders above Discworld prior to that point. This is it. This is that book. The setup here is masterful: the sprigs of lilac that slowly tell the reader something is going on, the refusal of any of the older Watch members to talk about it, the scene in the graveyard to establish the stakes, the disconcerting fact that Vetinari is wearing a sprig of lilac as well, and the feeling of building tension that matches the growing electrical storm. And Pratchett never gives into the temptation to explain everything and tip his hand prematurely. We know the 25th is coming and something is going to happen, and the reader can put together hints from Vimes's thoughts, but Pratchett lets us guess and sometimes be right and sometimes be wrong. Vimes is trying to change history, which adds another layer of uncertainty and enjoyment as the reader tries to piece together both the true history and the changes. This is a masterful job at a "what if?" story. And, beneath that, the commentary on policing and government and ethics is astonishingly good. In a review of an earlier Watch novel, I compared Pratchett to Dickens in the way that he focuses on a sort of common-sense morality rather than political theory. That is true here too, but oh that moral analysis is sharp enough to slide into you like a knife. This is not the Vimes that we first met in Guards! Guards!. He has has turned his cynical stubbornness into a working theory of policing, and it's subtle and complicated and full of nuance that he only barely knows how to explain. But he knows how to show it to people.

Keep the peace. That was the thing. People often failed to understand what that meant. You'd go to some life-threatening disturbance like a couple of neighbors scrapping in the street over who owned the hedge between their properties, and they'd both be bursting with aggrieved self-righteousness, both yelling, their wives would either be having a private scrap on the side or would have adjourned to a kitchen for a shared pot of tea and a chat, and they all expected you to sort it out. And they could never understand that it wasn't your job. Sorting it out was a job for a good surveyor and a couple of lawyers, maybe. Your job was to quell the impulse to bang their stupid fat heads together, to ignore the affronted speeches of dodgy self-justification, to get them to stop shouting and to get them off the street. Once that had been achieved, your job was over. You weren't some walking god, dispensing finely tuned natural justice. Your job was simply to bring back peace.

When Vimes is thrown back in time, he has to pick up the role of his own mentor, the person who taught him what policing should be like. His younger self is right there, watching everything he does, and he's desperately afraid he'll screw it up and set a worse example. Make history worse when he's trying to make it better. It's a beautifully well-done bit of tension that uses time travel as the hook to show both how difficult mentorship is and also how irritating one's earlier naive self would be.

He wondered if it was at all possible to give this idiot some lessons in basic politics. That was always the dream, wasn't it? "I wish I'd known then what I know now"? But when you got older you found out that you now wasn't you then. You then was a twerp. You then was what you had to be to start out on the rocky road of becoming you now, and one of the rocky patches on that road was being a twerp.

The backdrop of this story, as advertised by the map at the front of the book, is a revolution of sorts. And the revolution does matter, but not in the obvious way. It creates space and circumstance for some other things to happen that are all about the abuse of policing as a tool of politics rather than Vimes's principle of keeping the peace. I mentioned when reviewing Men at Arms that it was an awkward book to read in the United States in 2020. This book tackles the ethics of policing head-on, in exactly the way that book didn't. It's also a marvelous bit of competence porn. Somehow over the years, Vimes has become extremely good at what he does, and not just in the obvious cop-walking-a-beat sort of ways. He's become a leader. It's not something he thinks about, even when thrown back in time, but it's something Pratchett can show the reader directly, and have the other characters in the book comment on. There is so much more that I'd like to say, but so much would be spoilers, and I think Night Watch is more effective when you have the suspense of slowly puzzling out what's going to happen. Pratchett's pacing is exquisite. It's also one of the rare Discworld novels where Pratchett fully commits to a point of view and lets Vimes tell the story. There are a few interludes with other people, but the only other significant protagonist is, quite fittingly, Vetinari. I won't say anything more about that except to note that the relationship between Vimes and Vetinari is one of the best bits of fascinating subtlety in all of Discworld. I think it's also telling that nothing about Night Watch reads as parody. Sure, there is a nod to Back to the Future in the lightning storm, and it's impossible to write a book about police and street revolutions without making the reader think about Les Miserables, but nothing about this plot matches either of those stories. This is Pratchett telling his own story in his own world, unapologetically, and without trying to wedge it into parody shape, and it is so much the better book for it. The one quibble I have with the book is that the bits with the Time Monks don't really work. Lu-Tze is annoying and flippant given the emotional stakes of this story, the interludes with him are frustrating and out of step with the rest of the book, and the time travel hand-waving doesn't add much. I see structurally why Pratchett put this in: it gives Vimes (and the reader) a time frame and a deadline, it establishes some of the ground rules and stakes, and it provides a couple of important opportunities for exposition so that the reader doesn't get lost. But it's not good story. The rest of the book is so amazingly good, though, that it doesn't matter (and the framing stories for "what if?" explorations almost never make much sense). The other thing I have a bit of a quibble with is outside the book. Night Watch, as you may have guessed by now, is the origin of the May 25th Pratchett memes that you will be familiar with if you've spent much time around SFF fandom. But this book is dramatically different from what I was expecting based on the memes. You will, for example see a lot of people posting "Truth, Justice, Freedom, Reasonably Priced Love, And a Hard-Boiled Egg!", and before reading the book it sounds like a Pratchett-style humorous revolutionary slogan. And I guess it is, sort of, but, well... I have to quote the scene:

"You'd like Freedom, Truth, and Justice, wouldn't you, Comrade Sergeant?" said Reg encouragingly. "I'd like a hard-boiled egg," said Vimes, shaking the match out. There was some nervous laughter, but Reg looked offended. "In the circumstances, Sergeant, I think we should set our sights a little higher " "Well, yes, we could," said Vimes, coming down the steps. He glanced at the sheets of papers in front of Reg. The man cared. He really did. And he was serious. He really was. "But...well, Reg, tomorrow the sun will come up again, and I'm pretty sure that whatever happens we won't have found Freedom, and there won't be a whole lot of Justice, and I'm damn sure we won't have found Truth. But it's just possible that I might get a hard-boiled egg."

I think I'm feeling defensive of the heart of this book because it's such an emotional gut punch and says such complicated and nuanced things about politics and ethics (and such deeply cynical things about revolution). But I think if I were to try to represent this story in a meme, it would be the "angels rise up" song, with all the layers of meaning that it gains in this story. I'm still at the point where the lilac sprigs remind me of Sergeant Colon becoming quietly furious at the overstep of someone who wasn't there. There's one other thing I want to say about that scene: I'm not naturally on Vimes's side of this argument. I think it's important to note that Vimes's attitude throughout this book is profoundly, deeply conservative. The hard-boiled egg captures that perfectly: it's a bit of physical comfort, something you can buy or make, something that's part of the day-to-day wheels of the city that Vimes talks about elsewhere in Night Watch. It's a rejection of revolution, something that Vimes does elsewhere far more explicitly. Vimes is a cop. He is in some profound sense a defender of the status quo. He doesn't believe things are going to fundamentally change, and it's not clear he would want them to if they did. And yet. And yet, this is where Pratchett's Dickensian morality comes out. Vimes is a conservative at heart. He's grumpy and cynical and jaded and he doesn't like change. But if you put him in a situation where people are being hurt, he will break every rule and twist every principle to stop it.

He wanted to go home. He wanted it so much that he trembled at the thought. But if the price of that was selling good men to the night, if the price was filling those graves, if the price was not fighting with every trick he knew... then it was too high. It wasn't a decision that he was making, he knew. It was happening far below the areas of the brain that made decisions. It was something built in. There was no universe, anywhere, where a Sam Vimes would give in on this, because if he did then he wouldn't be Sam Vimes any more.

This is truly exceptional stuff. It is the best Discworld novel I have read, by far. I feel like this was the Watch novel that Pratchett was always trying to write, and he had to write five other novels first to figure out how to write it. And maybe to prepare Discworld readers to read it. There are a lot of Discworld novels that are great on their own merits, but also it is 100% worth reading all the Watch novels just so that you can read this book. Followed in publication order by The Wee Free Men and later, thematically, by Thud!. Rating: 10 out of 10

29 May 2023

John Goerzen: Recommendations for Tools for Backing Up and Archiving to Removable Media

I have several TB worth of family photos, videos, and other data. This needs to be backed up and archived. Backups and archives are often thought of as similar. And indeed, they may be done with the same tools at the same time. But the goals differ somewhat: Backups are designed to recover from a disaster that you can fairly rapidly detect. Archives are designed to survive for many years, protecting against disaster not only impacting the original equipment but also the original person that created them. Reflecting on this, it implies that while a nice ZFS snapshot-based scheme that supports twice-hourly backups may be fantastic for that purpose, if you think about things like family members being able to access it if you are incapacitated, or accessibility in a few decades time, it becomes much less appealing for archives. ZFS doesn t have the wide software support that NTFS, FAT, UDF, ISO-9660, etc. do. This post isn t about the pros and cons of the different storage media, nor is it about the pros and cons of cloud storage for archiving; these conversations can readily be found elsewhere. Let s assume, for the point of conversation, that we are considering BD-R optical discs as well as external HDDs, both of which are too small to hold the entire backup set. What would you use for archiving in these circumstances? Establishing goals The goals I have are:

Archives can be restored using Linux or Windows (even though I don t use Windows, this requirement will ensure the broadest compatibility in the future)
The archival system must be able to accommodate periodic updates consisting of new files, deleted files, moved files, and modified files, without requiring a rewrite of the entire archive dataset
Archives can ideally be mounted on any common OS and the component files directly copied off
Redundancy must be possible. In the worst case, one could manually copy one drive/disc to another. Ideally, the archiving system would automatically track making n copies of data.
While a full restore may be a goal, simply finding one file or one directory may also be a goal. Ideally, an archiving system would be able to quickly tell me which discs/drives contain a given file.
Ideally, preserves as much POSIX metadata as possible (hard links, symlinks, modification date, permissions, etc). However, for the archiving case, this is less important than for the backup case, with the possible exception of modification date.
Must be easy enough to do, and sufficiently automatable, to allow frequent updates without error-prone or time-consuming manual hassle

I would welcome your ideas for what to use. Below, I ll highlight different approaches I ve looked into and how they stack up. Basic copies of directories The initial approach might be one of simply copying directories across. This would work well if the data set to be archived is smaller than the archival media. In that case, you could just burn or rsync a new copy with every update and be done. Unfortunately, this is much less convenient with data of the size I m dealing with. rsync is unavailable in that case. With some datasets, you could manually design some rsyncs to store individual directories on individual devices, but that gets unwieldy fast and isn t scalable. You could use something like my datapacker program to split the data across multiple discs/drives efficiently. However, updates will be a problem; you d have to re-burn the entire set to get a consistent copy, or rely on external tools like mtree to reflect deletions. Not very convenient in any case. So I won t be using this. tar or zip While you can split tar and zip files across multiple media, they have a lot of issues. GNU tar s incremental mode is clunky and buggy; zip is even worse. tar files can t be read randomly, making it extremely time-consuming to extract just certain files out of a tar file. The only thing going for these formats (and especially zip) is the wide compatibility for restoration. dar Here we start to get into the more interesting tools. Dar is, in my opinion, one of the best Linux tools that few people know about. Since I first wrote about dar in 2008, it s added some interesting new features; among them, binary deltas and cloud storage support. So, dar has quite a few interesting features that I make use of in other ways, and could also be quite helpful here:

Dar can both read and write files sequentially (streaming, like tar), or with random-access (quick seek to extract a subset without having to read the entire archive)
Dar can apply compression to individual files, rather than to the archive as a whole, faciliting both random access and resilience (corruption in one file doesn t invalidate all subsequent files). Dar also supports numerous compression algorithms including gzip, bzip2, xz, lzo, etc., and can omit compressing already-compressed files.
The end of each dar file contains a central directory (dar calls this a catalog). The catalog contains everything necessary to extract individual files from the archive quickly, as well as everything necessary to make a future incremental archive based on this one. Additionally, dar can make and work with isolated catalogs a file containing the catalog only, without data.
Dar can split the archive into multiple pieces called slices. This can best be done with fixed-size slices ( slice and first-slice options), which let the catalog regord the slice number and preserves random access capabilities. With the execute option, dar can easily wait for a given slice to be burned, etc.
Dar normally stores an entire new copy of a modified file, but can optionally store an rdiff binary delta instead. This has the potential to be far smaller (think of a case of modifying metadata for a photo, for instance).

Additionally, dar comes with a dar_manager program. dar_manager makes a database out of dar catalogs (or archives). This can then be used to identify the precise archive containing a particular version of a particular file. All this combines to make a useful system for archiving. Isolated catalogs are tiny, and it would be easy enough to include the isolated catalogs for the entire set of archives that came before (or even the dar_manager database file) with each new incremental archive. This would make restoration of a particular subset easy. The main thing to address with dar is that you do need dar to extract the archive. Every dar release comes with source code and a win64 build. dar also supports building a statically-linked Linux binary. It would therefore be easy to include win64 binary, Linux binary, and source with every archive run. dar is also a part of multiple Linux and BSD distributions, which are archived around the Internet. I think this provides a reasonable future-proofing to make sure dar archives will still be readable in the future. The other challenge is user ability. While dar is highly portable, it is fundamentally a CLI tool and will require CLI abilities on the part of users. I suspect, though, that I could write up a few pages of instructions to include and make that a reasonably easy process. Not everyone can use a CLI, but I would expect a person that could follow those instructions could be readily-enough found. One other benefit of dar is that it could easily be used with tapes. The LTO series is liked by various hobbyists, though it could pose formidable obstacles to non-hobbyists trying to aceess data in future decades. Additionally, since the archive is a big file, it lends itself to working with par2 to provide redundancy for certain amounts of data corruption. git-annex git-annex is an interesting program that is designed to facilitate managing large sets of data and moving it between repositories. git-annex has particular support for offline archive drives and tracks which drives contain which files. The idea would be to store the data to be archived in a git-annex repository. Then git-annex commands could generate filesystem trees on the external drives (or trees to br burned to read-only media). In a post about using git-annex for blu-ray backups, an earlier thread about DVD-Rs was mentioned. This has a few interesting properties. For one, with due care, the files can be stored on archival media as regular files. There are some different options for how to generate the archives; some of them would place the entire git-annex metadata on each drive/disc. With that arrangement, one could access the individual files without git-annex. With git-annex, one could reconstruct the final (or any intermediate) state of the archive appropriately, handling deltions, renames, etc. You would also easily be able to know where copies of your files are. The practice is somewhat more challenging. Hundreds of thousands of files what I would consider a medium-sized archive can pose some challenges, running into hours-long execution if used in conjunction with the directory special remote (but only minutes-long with a standard git-annex repo). Ruling out the directory special remote, I had thought I could maybe just work with my files in git-annex directly. However, I ran into some challenges with that approach as well. I am uncomfortable with git-annex mucking about with hard links in my source data. While it does try to preserve timestamps in the source data, these are lost on the clones. I wrote up my best effort to work around all this. In a forum post, the author of git-annex comments that I don t think that CDs/DVDs are a particularly good fit for git-annex, but it seems a couple of users have gotten something working. The page he references is Managing a large number of files archived on many pieces of read-only medium. Some of that discussion is a bit dated (for instance, the directory special remote has the importtree feature that implements what was being asked for there), but has some interesting tips. git-annex supplies win64 binaries, and git-annex is included with many distributions as well. So it should be nearly as accessible as dar in the future. Since git-annex would be required to restore a consistent recovery image, similar caveats as with dar apply; CLI experience would be needed, along with some written instructions. Bacula and BareOS Although primarily tape-based archivers, these do also also nominally support drives and optical media. However, they are much more tailored as backup tools, especially with the ability to pull from multiple machines. They require a database and extensive configuration, making them a poor fit for both the creation and future extractability of this project. Conclusions I m going to spend some more time with dar and git-annex, testing them out, and hope to write some future posts about my experiences.

11 May 2023

Simon Josefsson: Streamlined NTRU Prime sntrup761 goes to IETF

The OpenSSH project added support for a hybrid Streamlined NTRU Prime post-quantum key encapsulation method sntrup761 to strengthen their X25519-based default in their version 8.5 released on 2021-03-03. While there has been a lot of talk about post-quantum crypto generally, my impression has been that there has been a slowdown in implementing and deploying them in the past two years. Why is that? Regardless of the answer, we can try to collaboratively change things, and one effort that appears strangely missing are IETF documents for these algorithms. Building on some earlier work that added X25519/X448 to SSH, writing a similar document was relatively straight-forward once I had spent a day reading OpenSSH and TinySSH source code to understand how it worked. While I am not perfectly happy with how the final key is derived from the sntrup761/X25519 secrets it is a SHA512 call on the concatenated secrets I think the construct deserves to be better documented, to pave the road for increased confidence or better designs. Also, reusing the RFC5656 4 structs makes for a worse specification (one unnecessary normative reference), but probably a simpler implementation. I have published draft-josefsson-ntruprime-ssh-00 here. Credit here goes to Jan Moj of TinySSH that designed the earlier sntrup4591761x25519-sha512@tinyssh.org in 2018, Markus Friedl who added it to OpenSSH in 2019, and Damien Miller that changed it to sntrup761 in 2020. Does anyone have more to add to the history of this work? Once I had sharpened my xml2rfc skills, preparing a document describing the hybrid construct between the sntrup761 key-encapsulation mechanism and the X25519 key agreement method in a non-SSH fashion was easy. I do not know if this work is useful, but it may serve as a reference for further study. I published draft-josefsson-ntruprime-hybrid-00 here. Finally, how about a IETF document on the base Streamlined NTRU Prime? Explaining all the details, and especially the math behind it would be a significant effort. I started doing that, but realized it is a subjective call when to stop explaining things. If we can t assume that the reader knows about lattice math, is a document like this the best place to teach it? I settled for the most minimal approach instead, merely giving an introduction to the algorithm, included SageMath and C reference implementations together with test vectors. The IETF audience rarely understands math, so I think it is better to focus on the bits on the wire and the algorithm interfaces. Everything here was created by the Streamlined NTRU Prime team, I merely modified it a bit hoping I didn t break too much. I have now published draft-josefsson-ntruprime-streamlined-00 here. I maintain the IETF documents on my ietf-ntruprime GitLab page, feel free to open merge requests or raise issues to help improve them. To have confidence in the code was working properly, I ended up preparing a branch with sntrup761 for the GNU-project Nettle and have submitted it upstream for review. I had the misfortune of having to understand and implement NIST s DRBG-CTR to compute the sntrup761 known-answer tests, and what a mess it is. Why does a deterministic random generator support re-seeding? Why does it support non-full entropy derivation? What s with the key size vs block size confusion? What s with the optional parameters? What s with having multiple algorithm descriptions? Luckily I was able to extract a minimal but working implementation that is easy to read. I can t locate DRBG-CTR test vectors, anyone? Does anyone have sntrup761 test vectors that doesn t use DRBG-CTR? One final reflection on publishing known-answer tests for an algorithm that uses random data: are the test vectors stable over different ways to implement the algorithm? Just consider of some optimization moved one randomness-extraction call before another, then wouldn t the output be different? Are there other ways to verify correctness of implementations? As always, happy hacking!

3 May 2023

John Goerzen: Martha the Pilot

Martha, now 5, can t remember a time when she didn t fly periodically. She s come along in our airplane in short flights to a nearby restaurant and long ones to Michigan and South Dakota. All this time, she s been riding in the back seat next to Laura. Martha has been talking excitedly about riding up front next to me. She wants to be my co-pilot . I promised to give her an airplane wing pin when she did one I got from a pilot of a commercial flight when I was a kid. Of course, safety was first, so I wanted to be sure she was old enough to fly there without being a distraction. Last weekend, the moment finally arrived. She was so excited! She brought along her Claire bear aviator, one that I bought for her at an airport a little while back. She buckled in two of her dolls in the back seat. Martha's dolls

And then up we went! Martha in the airplane

Martha was so proud when we landed! We went to Stearman Field, just a short 10-minute flight away, and parked the plane right in front of the restaurant.

We flew back, and Martha thought we should get a photo of her standing on the wing by the door. Great idea! Martha standing on the wing

She was happily jabbering about the flight all the way home. She told us several times about the pin she got, watching out the window, watching all the screens in the airplane, and also that she didn t get sick at all despite some turbulence. And, she says, Now just you and I can go flying! Yes, that s something I m looking forward to!

29 April 2023

Russ Allbery: INN 2.7.1

This is a bug fix and minor feature release over INN 2.7.0, and the upgrade should be painless. You can download the new release from ISC or my personal INN pages. The latter also has links to the full changelog and the other INN documentation. As of this release, we're no longer generating hashes and signed hashes. Instead, the release is a simple tarball and a detached GnuPG signature, similar to my other software releases. We're also maintaining the releases in parallel on GitHub. For the full list of changes, see the INN 2.7.1 NEWS file. As always, thanks to Julien LIE for preparing this release and doing most of the maintenance work on INN!

Next.

Previous.